-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thermalctld: Add support for fans on non-CPU modules #555
thermalctld: Add support for fans on non-CPU modules #555
Conversation
@bmridul and @mlok-nokia can you help check this . does this need sonic change or we should push this into platform implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just want to higlight that this change has a dependency of this PR mentioned by Author in the PR description: |
@bmridul , @mlok-nokia ping again |
I like this change! But at the same time, I think support should be added to include fans from all possible locations, not just add |
This already exists today; any fan in a fan drawer on the chassis is already displayed. This PR is just adding missing output for fans that are attached to modules. |
I'm referring specifically to fans not in a physical fan drawer (adding |
except Exception as e: | ||
self.log_warning('Failed to update fan status - {}'.format(repr(e))) | ||
|
||
for module_index, module in enumerate(self.chassis.get_all_modules()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thermals first check is_chassis_system
before looping over module
details, should that be done here as well? Or it's perhaps an unnecessary check there? https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-thermalctld/scripts/thermalctld#L599 That also looks at PSUs connected to the modules, which may themselves have fans.
The PR title describes this as only for non-CPU modules, but it looks like that is not actually a limitation; won't this get all fans in any module where they are configured? Not necessarily a bad thing, but worth noting.
PR
I think these 2 PR can be pushed independently. PR20603 is just Arista's platform specified code. |
Discussed during chassis meeting, this should be implemented in a follow-up PR, as it widens the scope to affect not just chassis but also fixed systems. |
@assrinivasan please help take a quick look of this PR content to see if you have any concerns especially for non-chassis platforms while @prgeor is out of office. |
@patrickmacarthur Can you please confirm this change will not break the CLI in single ASIC platform? |
* thermalctld: Add support for fans on non-CPU modules * Add module fan to unit tests
…evice is in detaching mode (#546) * Skip logging the warning, if device is in detaching mode * Add detach_info table and unittests * Fix unit tests * Increase code coverage * Remove unused header import * Fix dict get values * Increase code coverage * Increase test coverage * [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563) * Addition of DPU Chassis for thermalctld (#564) * [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535) * Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB * Moved 'start' message to actual starting point of the daemon * Added functions for formatted and epoch time for user friendly time display * Made changes per prgeor review comments * Pivot to SysLogger for all logging * Increased log level so that they are seen in syslogs * Code coverage improvement * [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542) When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST This PR works with the following 2 PRs: sonic-net/sonic-swss#3303 sonic-net/sonic-buildimage#20369 Signed-off-by: mlok <[email protected]> * Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560) Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB. * thermalctld: Add support for fans on non-CPU modules (#555) * thermalctld: Add support for fans on non-CPU modules * Add module fan to unit tests * Advanced Azure pipeline to Bookworm (#572) Description This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details. * Take non-CMIS xcvrs out of lpmode in SFF Manager (#565) Description Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task. This is intended to work together with the change in sonic-net/sonic-buildimage#20886. Motivation and Context Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode. How Has This Been Tested? Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard. Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected. * Added SmartSwitch support in chassisd and enabling chassisd (#467) Added SmartSwitch support in chassisd and enabling chassisd * [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576) Description Move the PSU parent information generation to the loop run function from the initialization function Motivation and Context Fixes #575 How Has This Been Tested? Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart. And monitored the stated db for memory for hours, works well: * [chassisd] Address the chassisd crash issue and add UT for it (#573) Description On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131 Motivation and Context Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it. How Has This Been Tested? Tested on 202405 branch Signed-off-by: mlok <[email protected]> * Fix a comment --------- Signed-off-by: mlok <[email protected]> Co-authored-by: Oleksandr Ivantsiv <[email protected]> Co-authored-by: Gagan Punathil Ellath <[email protected]> Co-authored-by: Ashwin Srinivasan <[email protected]> Co-authored-by: Marty Y. Lok <[email protected]> Co-authored-by: Vivek Verma <[email protected]> Co-authored-by: Patrick MacArthur <[email protected]> Co-authored-by: Peter Bailey <[email protected]> Co-authored-by: rameshraghupathy <[email protected]> Co-authored-by: Jianquan Ye <[email protected]>
Description
This adds support to the
show platform fans
command to show fans that are on modules.Motivation and Context
In the current Arista chassis model, the chassis fans are returned by
Module.get_all_fans()
as opposed toFanDrawer.get_all_fans()
, which currently thermalctld makes no provision for. This change allows fans that are under the modules to be listed in the command output.How Has This Been Tested?
This has been tested internally on a chassis, and the fan output now includes all fans on the chassis as opposed to just PSU fans:
Additional Information (Optional)
Platform library support change sonic-net/sonic-buildimage#20929 should be merged before this change.