Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for dpu midplane state update affecting dpu control_plane/data_plane state :issue-21371 #584

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rameshraghupathy
Copy link
Contributor

@rameshraghupathy rameshraghupathy commented Jan 16, 2025

Fix for dpu midplane state update affecting dpu control_plane/data_plane state

Description

The dpu midplane update and the dpu cp/dp state update happen in parallel. The midplane state update function reads the data from redis-DB, udpates the midplane state and then writes back the entire table. It appears like the DPU cp/dp state update sometimes goes in between the midplane read and and write causing a wrong update of dpu cp/dp states. The fix is to avoid reading the table and just write only the dpu midplane state. This will solve the problem.

Fixes: sonic-net/sonic-buildimage#21371

Motivation and Context

The dpu midplane update and the dpu cp/dp state update happen in parallel. The midplane state update function reads the data from redis-DB, udpates the midplane state and then writes back the entire table. It appears like the DPU cp/dp state update sometimes goes in between the midplane read and and write causing a wrong update of dpu cp/dp states. The fix is to avoid reading the table and just write only the dpu midplane state. This will solve the problem.

How Has This Been Tested?

Toggle the DPU on/off state multiple times and see if the state is reflected properly. Since, this code path is nvidia specific relying on their testing as well.

Additional Information (Optional)

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rameshraghupathy
Copy link
Contributor Author

/azp run

Copy link

Commenter does not have sufficient privileges for PR 584 in repo sonic-net/sonic-platform-daemons

Copy link

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rameshraghupathy
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Smartswitch][Chassisd] Race condition causes incorrect update of control_plane_state and data_plane_state
4 participants