-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(must-gather): Added must-gather scripts #261
Conversation
42021dd
to
d62db45
Compare
must-gather/gather
Outdated
esac | ||
|
||
get_kepler_instance() { | ||
log "getting kepler instance" >> "$LOGFILE_PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about log
itself writes / appends to $LOGFILE
?
WORKDIR / | ||
COPY --from=builder /workspace/manager . | ||
COPY --from=builder /workspace/must-gather/* /usr/bin/ | ||
COPY --from=origincli /usr/bin/oc /usr/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it also provide kubectl as well? could we copy that as well?
export KEPLER_NS="openshift-kepler-operator" | ||
export LOGFILE_NAME="gather-debug.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help with maintenance (and to make debugging easier) it is better to avoid side-effects as much as possible when sourcing files. Ideally, it is better if the file only included variables (esp globals) which is only used by the functions in that file. Hence I would recommend moving KEPLER_NS
to individual files. Yes we are repeating ourselves but I believe that is the lesser evil :)
If you notice the use of LOGFILE_NAME
, is unused in this file while a LOGFILE_PATH
is required by the functions and that must be defined elsewhere. I think it may be easier to encapsulate this by requiring the caller to invoke and init_log
function that sets the LOGFILE_PATH
and makes it readonly soon after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me buy I will get @vprashar2929 merge if after the validation is done.
get_subscription | ||
get_catalogsource | ||
get_install_plan | ||
get_csv | ||
get_kepler_operator_deployment_info | ||
get_kepler_operator_pod_info | ||
get_summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add || true to every step ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested it. even with oc command throwing error,. the execution continues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -e
should have stopped the script. Could you please try again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good that i didn't add it then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will verify this but we should always ensure that these steps break (exit with non-zero value) due to the nature of bash. E.g. say there is a typo in any of the command above, we should be able to find that outduring development we would comment out # || true
in the optional steps. get_catalog_source
should be optional since that is will only be present if you run bundle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have to deal with so much complexity, i would rather have must-gather written as a go program. oc adm must-gather
does not care if it is a script or a binary.
get_kepler_sa | ||
get_kepler_scc | ||
gather_kepler_exporter_info | ||
gather_olm_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we begin with capturing olm and operator info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test scenario, I installed only the operator (imagine the operator installation failed) and ran the must gather.
❯ less after-install/quay-io-sthaha-kepler-operator-sha256-df656da3b9f7a0c4310f8a44cb7b11eb790a7be43794e67c1343e934953f6478/gather-debug.log
2023-10-05 01:23:52 getting kepler instance
2023-10-05 01:23:52 oc get keplers.kepler.system.sustainable.computing.io kepler -oyaml
Error from server (NotFound): keplers.kepler.system.sustainable.computing.io "kepler" not found
And thats all it captures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the script is not stateful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me rephrase, in the scenario where the operator fails to install, i.e. no kepler
CRD in cluster, what information should we capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if script executes all its steps, we will capture olm info.
must-gather/gather
Outdated
|
||
get_kepler_instance() { | ||
log "getting kepler instance" | ||
run oc get keplers.kepler.system.sustainable.computing.io kepler -oyaml "$BASE_COLLECTION_PATH/kepler.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of Kepler instance is not created the execution gets terminated and the rest of the steps are skipped. I also believe we should add || true
at every step
must-gather/gather
Outdated
echo -e "must-gather logs are located at: '${LOGFILE_PATH}'" | ||
|
||
mkdir -p "${BASE_COLLECTION_PATH}/cache-dir" | ||
export KUBECACHEDIR=${BASE_COLLECTION_PATH}/cache-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copies cache-dir which has all the kubectl's discovery and http info, do we need to copy the cache-dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cache-dir is deleted at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
must-gather/gather
Outdated
case ${1-} in | ||
--help | -h ) | ||
print_usage | ||
exit 1 | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case ${1-} in | |
--help | -h ) | |
print_usage | |
exit 1 | |
esac | |
case ${1-} in | |
--help | -h ) | |
print_usage | |
return 0 | |
esac |
Since the help was explicitly called.
} | ||
|
||
get_summary() { | ||
run oc -n "$KEPLER_OPERATOR_NS" get catalogsource kepler-operator-catalog -owide "$KEPLER_OPERATOR_INFO_DIR/summary.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scenario where operator is installed from community-catalog
, get_summary
breaks here and the script terminates execution
2023-10-05 01:24:48 oc -n openshift-operators get subscription -l operators.coreos.com/kepler-operator.openshift-operators= -oyaml
2023-10-05 01:24:48 collecting catalogsource info for kepler-operator
2023-10-05 01:24:48 oc -n openshift-operators get catalogsource kepler-operator-catalog -oyaml
Error from server (NotFound): catalogsources.operators.coreos.com "kepler-operator-catalog" not found
~
❯ ls kepler-operator-info
kepler-operator-catalogsource.yaml kepler-operator-subscription.yaml
echo -e "\n" >> "$KEPLER_OPERATOR_INFO_DIR/summary.txt" | ||
|
||
run oc -n "$KEPLER_OPERATOR_NS" get csv -owide "$KEPLER_OPERATOR_INFO_DIR/summary.txt" | ||
echo -e "\n" >> "$KEPLER_OPERATOR_INFO_DIR/summary.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: how about we add \n
in run
function itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting change to only fix the common scenario where operator is installed from community-catalog and running must-gather fails to run to completion.
Signed-off-by: Vimal Kumar <[email protected]>
script runs to completion |
Thie PR adds
must-gather
scripts for kepler-operatorSample execution on single node CRC cluster: