-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[techsupport] improve robustness #2117
[techsupport] improve robustness #2117
Conversation
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
…o robust-techsupport Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
# run 'hw-management-generate-dump.sh' script and save the result file | ||
HW_DUMP_FILE=/usr/bin/hw-management-generate-dump.sh | ||
if [ -f "$HW_DUMP_FILE" ]; then | ||
${CMD_PREFIX}${timeout_cmd} /usr/bin/hw-management-generate-dump.sh $ALLOW_PROCESS_STOP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if timeout expired and file does not exist, we should cont but IMO should have an error message identify timeout expired and no hw mgmt file. .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: Stepan Blyschak <[email protected]>
@yxieca , @qiluo-msft could you please help to signoff? |
- What I did Fixed error message: "Failed to get routing stack: a bytes-like object is required, not 'str'" introduced by #2117 - How I did it Add missing 'text' parameter - How to verify it Run on switch Signed-off-by: Stepan Blyschak <[email protected]>
- What I did Execute the command in a separate bash process under timeout. Due to with '--foreground' the child processes are not killed and still have the file descriptor opened to piped process which will basically hang forever. From the "timeout" man page: --foreground when not running timeout directly from a shell prompt, allow COMMAND to read from the TTY and get TTY signals; in this mode, children of COMMAND will not be timed out So, a command like this: timeout --foreground 5s some_process_that_spawn_childs_that_produce_output | tee some_file Will actually hang forever. The '--foreground' option is still needed. So the proposed solution is to run that construct in another bash process: timeout --foreground 5s bash -c "some_process_that_spawn_childs_that_produce_output | tee some_file" Moved hw-mgmt-dump to collect_mellanox function and execute it under timeout. Handle global timeout: When global timeout is reached the SIGTERM is sent to generate_dump process, upon SIGTERM the generate_dump process will finalize its work by creating a dump of what already has been collected, providing a dump even when timeout is reached. The generate_dump will be killed in case it cannot create dump after initial SIGTERM. - How I did it I implemented the above mentioned features. - How to verify it Artificially hang the FW, execute "show techsupport" and verify even in case timeout is reached the dump is generated. Signed-off-by: Stepan Blyschak <[email protected]>
- What I did Fixed error message: "Failed to get routing stack: a bytes-like object is required, not 'str'" introduced by #2117 - How I did it Add missing 'text' parameter - How to verify it Run on switch Signed-off-by: Stepan Blyschak <[email protected]>
@stepanblyschak @liat-grozovik |
Signed-off-by: Stepan Blyschak [email protected]
What I did
From the "timeout" man page:
So, a command like this:
Will actually hang forever. The '--foreground' option is still needed. So the proposed solution is to run that construct in another bash process:
How I did it
I implemented the above mentioned features.
How to verify it
Artificially hang the FW, execute "show techsupport" and verify even in case timeout is reached the dump is generated.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)