Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded timeout for call of getParameter.py to get operation mode? #183

Open
fdanapfel opened this issue May 23, 2023 · 3 comments
Open
Labels

Comments

@fdanapfel
Copy link
Contributor

Hi,

while the timeouts for most calls of HANA binaries and python scripts have been made configurable with commit 7c66a3b , the call to run getParameter.py to get the operation mode still uses a hardcoded timeout of 10 seconds:

https://github.com/SUSE/SAPHanaSR/blob/maintenance-classic/ra/SAPHana#L2664

For other calls of getParameter.py in the resource agents however the $HANA_CALL_TIMEOUT variable is used to use a configurable timeout.

Is there a specific reason why a hardcoded timeout is used for the getParameter.py call to get the operation mode, or would it be possible to also make the timeout configurable by using the $HANA_CALL_TIMEOUT variable instead?

@fmherschel
Copy link
Member

The first reason was, that getParamater.py should always answer very fast. Do you have a realistic situation, where getParameter.py did not answer in time? What might be the reason for this? A hanging NFS share? I did not had reviewed that now on code level. My guess is that other then for systemReplicationStatus.py where we have a hard argument to stay with the short timeout, we might change that for the getParameter.py. But we also should take into account that hanging resources could not all be addressed by the SAPHanaSR* resource agents. In special the classic SAPHanaSR resource agents are not independent from the cluster system environment.
Just my first 2ct.

@fdanapfel
Copy link
Contributor Author

I don't have an actual situation where getParameter.py did not answer in time, I was just asked by some colleagues why there is a hardcoded timeout for the call of getParameter.py in this specific case, whereas for other calls of getParameter.py the configurable timeout is used in the resource agents.

@fmherschel
Copy link
Member

fmherschel commented May 23, 2023

I just have reviewed: https://github.com/SUSE/SAPHanaSR/blob/maintenance-classic/ra/SAPHana#L2664
This is "only" the operation mode if the SR. It needs to be only aquired once* before a register of a former primary is done. So we selected a shorter timeout to prevent to long RA runtimes by adding long timeouts in sequence.
But maybe we should implement the fallback. If getting log-mode is timing-out the function should keep the old value of a query done before.

*) We query the status more than once to get updates, if something would change during the cluster runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants