Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chassis][LAG_ID] Address the same lagid been used in two different LCs issue #3303

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

mlok-nokia
Copy link
Contributor

@mlok-nokia mlok-nokia commented Sep 28, 2024

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

  1. Portchannel creation
    a) If Portchannel is created with a valid plagid
    * check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
    * If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
    b) If Portchannel is created with invalid plagid or without any lagid
    • lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
  2. Portchannel delection
    • Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.

This PR works with the following 2 PRs:
sonic-net/sonic-platform-daemons#542
sonic-net/sonic-buildimage#20369

Based on the dependency, the below order to merge these 3 PRs can help to avoid breaking the image run:
First: PR sonic-net/sonic-buildimage#20369
second: PR #3303 (This PR)
Third: PR sonic-net/sonic-platform-daemons#542

Why I did it
To address the issue of the same lagid could be used by two Portchannels in two different linecards. This issue occurs when reboot many Linecards together with 20 seconds delay in each LC reboot.

How I verified it

Details if related

@arlakshm
Copy link
Contributor

can you update the UT as well.

@mlok-nokia mlok-nokia force-pushed the lag_id_conflict_issue branch 3 times, most recently from 80f2a59 to e9938d5 Compare October 1, 2024 17:33
@mlok-nokia
Copy link
Contributor Author

can you update the UT as well.

UT has been updated.

@mlok-nokia mlok-nokia marked this pull request as ready for review October 2, 2024 19:42
arlakshm
arlakshm previously approved these changes Nov 13, 2024
@arlakshm
Copy link
Contributor

/Azp Azure.sonic-swss

Copy link

Command 'Azure.sonic-swss' is not supported by Azure Pipelines.

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@rlhui
Copy link
Contributor

rlhui commented Nov 22, 2024

@mlok-nokia , branch is out-of-date

@mlok-nokia
Copy link
Contributor Author

@mlok-nokia , branch is out-of-date

Fixed, Thanks

@mlok-nokia mlok-nokia force-pushed the lag_id_conflict_issue branch from 39bce94 to e087f40 Compare November 22, 2024 21:55
@mlok-nokia
Copy link
Contributor Author

mlok-nokia commented Nov 25, 2024

@arlakshm @judyjoseph PR has been update. Please merge it.

@arlakshm
Copy link
Contributor

@prsunny please help merge this PR

mssonicbld pushed a commit to mssonicbld/sonic-swss that referenced this pull request Nov 30, 2024
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3404

mssonicbld pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Nov 30, 2024
When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>
mssonicbld pushed a commit that referenced this pull request Nov 30, 2024
…Cs issue (#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
bradh352 pushed a commit to bradh352/sonic-swss that referenced this pull request Dec 4, 2024
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
bradh352 pushed a commit to bradh352/sonic-swss that referenced this pull request Dec 4, 2024
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
arista-nwolfe added a commit to arista-nwolfe/sonic-buildimage that referenced this pull request Dec 9, 2024
rlhui pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 10, 2024
The following PRs made 1024 incorrect:
#20369
sonic-net/sonic-swss#3303

This fixes:
#21096
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 10, 2024
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 10, 2024
mssonicbld pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 11, 2024
divyachandralekha pushed a commit to divyachandralekha/sonic-swss that referenced this pull request Dec 12, 2024
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
divyachandralekha pushed a commit to divyachandralekha/sonic-swss that referenced this pull request Dec 12, 2024
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
vvolam pushed a commit to vvolam/sonic-platform-daemons that referenced this pull request Jan 3, 2025
…net#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>
@ysmanman
Copy link
Contributor

ysmanman commented Jan 9, 2025

@arlakshm curious if we want to backport the fix to 202205?

mssonicbld added a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 16, 2025
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
stepanblyschak pushed a commit to stepanblyschak/sonic-swss that referenced this pull request Jan 27, 2025
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
prgeor pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Feb 6, 2025
…evice is in detaching mode (#546)

* Skip logging the warning, if device is in detaching mode

* Add detach_info table and unittests

* Fix unit tests

* Increase code coverage

* Remove unused header import

* Fix dict get values

* Increase code coverage

* Increase test coverage

* [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563)

* Addition of DPU Chassis for thermalctld (#564)

* [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535)

* Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB

* Moved 'start' message to actual starting point of the daemon

* Added functions for formatted and epoch time for user friendly time display

* Made changes per prgeor review comments

* Pivot to SysLogger for all logging

* Increased log level so that they are seen in syslogs

* Code coverage improvement

* [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>

* Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560)

Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.

* thermalctld: Add support for fans on non-CPU modules (#555)

* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests

* Advanced Azure pipeline to Bookworm (#572)

Description
This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details.

* Take non-CMIS xcvrs out of lpmode in SFF Manager (#565)

Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.

* Added SmartSwitch support in chassisd and enabling chassisd  (#467)

Added SmartSwitch support in chassisd and enabling chassisd

* [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576)

Description
Move the PSU parent information generation to the loop run function from the initialization function

Motivation and Context
Fixes #575

How Has This Been Tested?
Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart.
And monitored the stated db for memory for hours, works well:

* [chassisd] Address the chassisd crash issue and add UT for it (#573)

Description
On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131

Motivation and Context
Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it.

How Has This Been Tested?
Tested on 202405 branch


Signed-off-by: mlok <[email protected]>

* Fix a comment

---------

Signed-off-by: mlok <[email protected]>
Co-authored-by: Oleksandr Ivantsiv <[email protected]>
Co-authored-by: Gagan Punathil Ellath <[email protected]>
Co-authored-by: Ashwin Srinivasan <[email protected]>
Co-authored-by: Marty Y. Lok <[email protected]>
Co-authored-by: Vivek Verma <[email protected]>
Co-authored-by: Patrick MacArthur <[email protected]>
Co-authored-by: Peter Bailey <[email protected]>
Co-authored-by: rameshraghupathy <[email protected]>
Co-authored-by: Jianquan Ye <[email protected]>
shiraez pushed a commit to Marvell-switching/sonic-swss that referenced this pull request Feb 17, 2025
…Cs issue (sonic-net#3303)

What I did
Create SYSTEM_LAG_IDS_FREE_LIST for assign lagId for all portchannel creation.

Portchannel creation
a) If Portchannel is created with a valid plagid
* check if plagid is in free list, use plagid and remove it from SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
* If plagid is not in the FREE_LIST, lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
b) If Portchannel is created with invalid plagid or without any lagid
lpop and use the first lagid from the SYSTEM_LAG_IDS_FREE_LIST. Add this lagid to the SYSTEM_LAG_ID_SET for debug info
Portchannel delection
Append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST. Also remove it from SYSTEM_LAG_ID_SET.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants