Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor redfish login and logout methods. #64

Merged
merged 3 commits into from
Mar 8, 2024

Conversation

chanchiwai-ray
Copy link
Contributor

@chanchiwai-ray chanchiwai-ray commented Mar 6, 2024

The redfish context manager could potentially failed during __enter__ which might causes the __exit__ function never called. We should ensure the redfish client is logged out in any exception. This can be achieved by adding a finally block to the try-except block.

Note that even if the login() crashes, the server might still recognize the login as successful, and a force logout might not work since we did not get the response from the initial login(). For example, if a login is timed out, and the exporter service raise and catch the exception, but the response from the login() will not set the session_key or session_location in the response object (since we never get back a response), so a force logout() will not be able to perform a DELETE (using the session_location) on the server to close the connection. This is an inherent issue on the redfish library that probably cannot be easily fixed

Closes: canonical/hardware-observer-operator#136

@chanchiwai-ray chanchiwai-ray requested a review from rgildein March 6, 2024 09:33
@chanchiwai-ray
Copy link
Contributor Author

chanchiwai-ray commented Mar 6, 2024

Note that I edit my PR description to indicate a potential loop hole that we can't really fix. The workaround is to increase the redfish timeout so that the case I mentioned can be significantly reduced.

rgildein
rgildein previously approved these changes Mar 6, 2024
Copy link
Contributor

@rgildein rgildein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not agree with defining the context manager, but not using it, since it can fail. But I do not see this blocker issue for now.

prometheus_hardware_exporter/collector.py Show resolved Hide resolved
@chanchiwai-ray
Copy link
Contributor Author

Do not agree with defining the context manager, but not using it, since it can fail. But I do not see this blocker issue for now.

We shouldn't simply remove it because we don't use it, since it will cause a backward compatibility issue (just in case other users are using this repo as a library).

@rgildein
Copy link
Contributor

rgildein commented Mar 6, 2024

Do not agree with defining the context manager, but not using it, since it can fail. But I do not see this blocker issue for now.

We shouldn't simply remove it because we don't use it, since it will cause a backward compatibility issue (just in case other users are using this repo as a library).

I never suggested that, but if anyone is using as lib, they will hit same issue and the fix will be same as you are doing "not using context manager".

@dashmage
Copy link
Contributor

dashmage commented Mar 6, 2024

After reading through the comments and thinking about this,

  • Currently -- we have a context manager to manage the login/logout for redfish as part of the RedfishHelper class. We also need to have a try/except block around the context manager itself to catch exceptions while running the various helper methods for redfish (get_chassis_data, get_storage_drive_data etc.) inside it.
  • The main issue is when an exception occurs while executing redfish_obj.login which will cause __enter__ to raise it and __exit__ isn't called. Which is supposed to be fine...but somehow there's still a login session being created so not logging out is a problem. As mentioned in the existing comments already, there does seem to be an upstream bug lurking here.
  • All we'd need to do is force a logout -- this can happen with a try/except in __enter__ similar to what @gabrielcocenza proposed in this comment. Or we could just directly call the login/logout methods in collector.py so there's just one set of try/except/finally statements and we can ignore the context manager -- which is Ray's implementation in the PR now. I'm personally in favor of the second option.
  • I don't quite understand how if the redfish_obj.login doesn't work and the session_{key,location} aren't set, the server still registers a login. If redfish_obj.logout itself doesn't work for some reason to clean up the session, then the fix would certainly have to come from the upstream redfish library. We could potentially check for existing sessions and perform cleanup manually but then it would run the risk of removing legitimate sessions by actual users since we wouldn't have any way of identifying the sessions created by the charm.

Please correct me if I'm wrong in any of the assumptions in my summary.

@chanchiwai-ray
Copy link
Contributor Author

I don't quite understand how if the redfish_obj.login doesn't work and the session_{key,location} aren't set, the server still registers a login. If redfish_obj.logout itself doesn't work for some reason to clean up the session, then the fix would certainly have to come from the upstream redfish library. We could potentially check for existing sessions and perform cleanup manually but then it would run the risk of removing legitimate sessions by actual users since we wouldn't have any way of identifying the sessions created by the charm.

For example if you set a login timeout to 1s, and you think your server can handles the login in 1s, that's fine. However, if the server take too long to respond (say 5s), then redfish_client raise time out exception, and can't handle the response which sets the self.__session_key. In that case, even if we do redfish_client.logout [1] it will do nothing.

[1] https://github.com/DMTF/python-redfish-library/blob/main/src/redfish/rest/v1.py#L1027

redfish context manager could potentially failed during __enter__ which
might causes the __exit__ function never called. We should ensure the
redfish client is logged out in any exception. This can be achieved by
adding a finally block to the try-except block.
@chanchiwai-ray
Copy link
Contributor Author

Rebased onto ba509b6

Copy link
Contributor

@dashmage dashmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks :)

@rgildein rgildein dismissed their stale review March 7, 2024 11:44

I do not want to block this, and I need to focus on other things.

Copy link
Member

@gabrielcocenza gabrielcocenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a small comment that it's not relevant.

prometheus_hardware_exporter/collector.py Show resolved Hide resolved
@chanchiwai-ray chanchiwai-ray merged commit a3e06c5 into canonical:main Mar 8, 2024
3 checks passed
@chanchiwai-ray chanchiwai-ray deleted the bug/bseng-1871 branch March 8, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scrape jobs never close connection
5 participants