Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thermalctld: Add support for fans on non-CPU modules #555

Conversation

patrickmacarthur
Copy link
Contributor

@patrickmacarthur patrickmacarthur commented Oct 30, 2024

Description

This adds support to the show platform fans command to show fans that are on modules.

Motivation and Context

In the current Arista chassis model, the chassis fans are returned by Module.get_all_fans() as opposed to FanDrawer.get_all_fans(), which currently thermalctld makes no provision for. This change allows fans that are under the modules to be listed in the command output.

How Has This Been Tested?

This has been tested internally on a chassis, and the fan output now includes all fans on the chassis as opposed to just PSU fans:

admin@cmp206:~$ show platform fan
  Drawer    LED     FAN    Speed    Direction    Presence    Status          Timestamp
--------  -----  ------  -------  -----------  ----------  --------  -----------------
     N/A    off  fan0/1      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan0/2      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/3      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/4      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/5      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/6      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/7      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    red  fan0/8      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/1      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/2      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/3      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/4      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/5      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/6      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/7      29%      exhaust     Present        OK  20241030 15:54:08
     N/A    off  fan1/8      29%      exhaust     Present        OK  20241030 15:54:08
....
     N/A    off  psu2/1      49%       intake     Present        OK  20241030 15:54:08
     N/A    off  psu4/1      44%       intake     Present        OK  20241030 15:54:08
     N/A    off  psu6/1      44%       intake     Present        OK  20241030 15:54:08
     N/A    off  psu8/1      46%       intake     Present        OK  20241030 15:54:09

Additional Information (Optional)

Platform library support change sonic-net/sonic-buildimage#20929 should be merged before this change.

@abdosi
Copy link
Contributor

abdosi commented Oct 30, 2024

@bmridul and @mlok-nokia can you help check this . does this need sonic change or we should push this into platform implementation.

Copy link
Contributor

@gechiang gechiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gechiang
Copy link
Contributor

Just want to higlight that this change has a dependency of this PR mentioned by Author in the PR description:
Platform library support change sonic-net/sonic-buildimage#20603 should be merged before this change.

@rlhui
Copy link

rlhui commented Nov 27, 2024

Just want to higlight that this change has a dependency of this PR mentioned by Author in the PR description: Platform library support change sonic-net/sonic-buildimage#20603 should be merged before this change.

@bmridul and @mlok-nokia can you help check this . does this need sonic change or we should push this into platform implementation.

@bmridul , @mlok-nokia ping again

@spilkey-cisco
Copy link
Contributor

I like this change! But at the same time, I think support should be added to include fans from all possible locations, not just add Modules. Similar to Module, fans may exist directly on the chassis, and not in a fan drawer. Can a type and handling for that be added in this PR as well?

@patrickmacarthur
Copy link
Contributor Author

I like this change! But at the same time, I think support should be added to include fans from all possible locations, not just add Modules. Similar to Module, fans may exist directly on the chassis, and not in a fan drawer. Can a type and handling for that be added in this PR as well?

This already exists today; any fan in a fan drawer on the chassis is already displayed. This PR is just adding missing output for fans that are attached to modules.

@spilkey-cisco
Copy link
Contributor

I like this change! But at the same time, I think support should be added to include fans from all possible locations, not just add Modules. Similar to Module, fans may exist directly on the chassis, and not in a fan drawer. Can a type and handling for that be added in this PR as well?

This already exists today; any fan in a fan drawer on the chassis is already displayed. This PR is just adding missing output for fans that are attached to modules.

I'm referring specifically to fans not in a physical fan drawer (adding CHASSIS as a type in addition to DRAWER). Today, vendors must configure some notion of a 'logical' fan drawer to house fans connected directly to the chassis (not in a physical fan drawer), essentially treating the chassis itself as a fan drawer. This could be further confused by a chassis that could have both physical fan drawers with fans, and fans directly connected to the chassis without fan drawers. This feels like an unnecessary limitation, but perhaps there is some reason I'm missing as to why this should not be done?

except Exception as e:
self.log_warning('Failed to update fan status - {}'.format(repr(e)))

for module_index, module in enumerate(self.chassis.get_all_modules()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thermals first check is_chassis_system before looping over module details, should that be done here as well? Or it's perhaps an unnecessary check there? https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-thermalctld/scripts/thermalctld#L599 That also looks at PSUs connected to the modules, which may themselves have fans.

The PR title describes this as only for non-CPU modules, but it looks like that is not actually a limitation; won't this get all fans in any module where they are configured? Not necessarily a bad thing, but worth noting.

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Nov 29, 2024

Just want to higlight that this change has a dependency of this PR mentioned by Author in the PR description: Platform library support change sonic-net/sonic-buildimage#20603 should be merged before this change.

@bmridul and @mlok-nokia can you help check this . does this need sonic change or we should push this into platform implementation.

@bmridul , @mlok-nokia ping again

PR

Just want to higlight that this change has a dependency of this PR mentioned by Author in the PR description: Platform library support change sonic-net/sonic-buildimage#20603 should be merged before this change.

@bmridul and @mlok-nokia can you help check this . does this need sonic change or we should push this into platform implementation.

@bmridul , @mlok-nokia ping again

I think these 2 PR can be pushed independently. PR20603 is just Arista's platform specified code.

@gechiang gechiang requested a review from prgeor December 2, 2024 23:52
@patrickmacarthur
Copy link
Contributor Author

I like this change! But at the same time, I think support should be added to include fans from all possible locations, not just add Modules. Similar to Module, fans may exist directly on the chassis, and not in a fan drawer. Can a type and handling for that be added in this PR as well?

This already exists today; any fan in a fan drawer on the chassis is already displayed. This PR is just adding missing output for fans that are attached to modules.

I'm referring specifically to fans not in a physical fan drawer (adding CHASSIS as a type in addition to DRAWER). Today, vendors must configure some notion of a 'logical' fan drawer to house fans connected directly to the chassis (not in a physical fan drawer), essentially treating the chassis itself as a fan drawer. This could be further confused by a chassis that could have both physical fan drawers with fans, and fans directly connected to the chassis without fan drawers. This feels like an unnecessary limitation, but perhaps there is some reason I'm missing as to why this should not be done?

Discussed during chassis meeting, this should be implemented in a follow-up PR, as it widens the scope to affect not just chassis but also fixed systems.

@gechiang gechiang requested a review from assrinivasan December 4, 2024 21:05
@gechiang
Copy link
Contributor

gechiang commented Dec 4, 2024

@assrinivasan please help take a quick look of this PR content to see if you have any concerns especially for non-chassis platforms while @prgeor is out of office.
Thanks!

@rlhui rlhui merged commit 60e7224 into sonic-net:master Dec 6, 2024
5 checks passed
@bingwang-ms
Copy link

@patrickmacarthur Can you please confirm this change will not break the CLI in single ASIC platform?

vvolam pushed a commit to vvolam/sonic-platform-daemons that referenced this pull request Jan 3, 2025
* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests
prgeor pushed a commit that referenced this pull request Feb 6, 2025
…evice is in detaching mode (#546)

* Skip logging the warning, if device is in detaching mode

* Add detach_info table and unittests

* Fix unit tests

* Increase code coverage

* Remove unused header import

* Fix dict get values

* Increase code coverage

* Increase test coverage

* [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563)

* Addition of DPU Chassis for thermalctld (#564)

* [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535)

* Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB

* Moved 'start' message to actual starting point of the daemon

* Added functions for formatted and epoch time for user friendly time display

* Made changes per prgeor review comments

* Pivot to SysLogger for all logging

* Increased log level so that they are seen in syslogs

* Code coverage improvement

* [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>

* Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560)

Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.

* thermalctld: Add support for fans on non-CPU modules (#555)

* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests

* Advanced Azure pipeline to Bookworm (#572)

Description
This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details.

* Take non-CMIS xcvrs out of lpmode in SFF Manager (#565)

Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.

* Added SmartSwitch support in chassisd and enabling chassisd  (#467)

Added SmartSwitch support in chassisd and enabling chassisd

* [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576)

Description
Move the PSU parent information generation to the loop run function from the initialization function

Motivation and Context
Fixes #575

How Has This Been Tested?
Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart.
And monitored the stated db for memory for hours, works well:

* [chassisd] Address the chassisd crash issue and add UT for it (#573)

Description
On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131

Motivation and Context
Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it.

How Has This Been Tested?
Tested on 202405 branch


Signed-off-by: mlok <[email protected]>

* Fix a comment

---------

Signed-off-by: mlok <[email protected]>
Co-authored-by: Oleksandr Ivantsiv <[email protected]>
Co-authored-by: Gagan Punathil Ellath <[email protected]>
Co-authored-by: Ashwin Srinivasan <[email protected]>
Co-authored-by: Marty Y. Lok <[email protected]>
Co-authored-by: Vivek Verma <[email protected]>
Co-authored-by: Patrick MacArthur <[email protected]>
Co-authored-by: Peter Bailey <[email protected]>
Co-authored-by: rameshraghupathy <[email protected]>
Co-authored-by: Jianquan Ye <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants