Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle page select and remote access check after changing SFP target #462

Merged
merged 2 commits into from
May 2, 2024

Conversation

mihirpat1
Copy link
Contributor

@mihirpat1 mihirpat1 commented May 2, 2024

Description

After changing the target to remote, many sfputil commands which access transceiver EEPROM fail with the below traceback if either of the below scenario is true.

  1. If the remote target had page select register set to non-zero
  2. If the remote target was powered down or is removed from the device

Traceback details

Running command: sudo sfputil show fwversion Ethernet232
['Traceback (most recent call last):', '  File "/usr/local/bin/sfputil", line 8, in <module>', '    sys.exit(cli())', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__', '    return self.main(*args, **kwargs)', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main', '    rv = self.invoke(ctx)', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke', '    return _process_result(sub_ctx.command.invoke(sub_ctx))', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke', '    return _process_result(sub_ctx.command.invoke(sub_ctx))', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke', '    return ctx.invoke(self.callback, **ctx.params)', '  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke', '    return callback(*args, **kwargs)', '  File "/usr/local/lib/python3.9/dist-packages/sfputil/main.py", line 1056, in fwversion', '    show_firmware_version(physical_port)', '  File "/usr/local/lib/python3.9/dist-packages/sfputil/main.py", line 1026, in show_firmware_version', '    api = sfp.get_xcvr_api()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sfp_base.py", line 468, in get_xcvr_api', '    self.refresh_xcvr_api()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sfp_base.py", line 458, in refresh_xcvr_api', '    self._xcvr_api = self._xcvr_api_factory.create_xcvr_api()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/xcvr_api_factory.py", line 76, in create_xcvr_api', '    vendor_pn = self._get_vendor_part_num()', '  File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/xcvr_api_factory.py", line 67, in _get_vendor_part_num', '    vendor_pn = part_num.decode()', "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 2: invalid start byte"]

Motivation and Context

The set_firmware_download_target_end API needs to be enhanced since it currently

  1. Does not set the page select byte to 0 after changing the target (assumes it to be zero). In addition, the optoe kernel driver changes page select byte only if a non-zero page is being accessed
    https://github.com/opencomputeproject/OpenNetworkLinux/blob/e85f118fb6400caf869f4a83b92c0bc3123962da/packages/base/any/kernels/modules/optoe.c#L522
  2. Does not ensure if the remote target is accessible after changing the target.

If either of the above scenario is true, when sfputil or xcvrd attempts reading the Vendor name and Vendor part number (residing an Upper page 0h) to create the relevant transceiver specific instances (

), an unexpected data is read since

  1. If the current scenario is #1, the data of upper page will be actually read from the corresponding non-zero page selected
  2. If the current scenarios is #2, all registers of EEPROM except TARGET_MODE register will have a value of 0xff populated

In addition to the above changes, the get_current_target_end API has been added for future use.

How Has This Been Tested?

Following test cases were executed

  1. Change target to E1 and dump page 1
  2. Modify page select to 0x10 on E1 and change target to E1
  3. Modify page select to 0x11 on E2 and change target to E2
  4. Remove E1, change target to E1 and ensure target is changed back to 0
  5. Ensure firmware info is not populated when remote target is not accessible
  6. Ensure get_firmware_download_target_end returns valid output

Additional Information (Optional)

MSFT ADO - 27920899

@prgeor
Copy link
Collaborator

prgeor commented May 2, 2024

@mihirpat1 you test the case where

  1. remote target is powered off or removed


return True

def _handle_error_and_restore_target_to_E0(self, error_message):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 this API be rename to _restore_target_E0(). Let the caller print the log errors before calling this. This allows reusing this API in future without throwing error logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor I have addressed this now.


return True

def get_firmware_download_target_end(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 nit, rename to get_current_target_end()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 please check if this is tested for

  1. remote target is already powered off
  2. remote target is powered off after setting the target > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor I have addressed this now. I haven't tested the power off scenario though.

@mihirpat1
Copy link
Contributor Author

@mihirpat1 you test the case where

  1. remote target is powered off or removed

@prgeor I have tested the case wherein remote target is removed

@prgeor prgeor merged commit 655a5ff into sonic-net:master May 2, 2024
5 checks passed
@StormLiangMS
Copy link

hi @mihirpat1 @prgeor there is a conflict, could you raise separate PR?

@mihirpat1
Copy link
Contributor Author

/azp run

@mihirpat1
Copy link
Contributor Author

hi @mihirpat1 @prgeor there is a conflict, could you raise separate PR?

@StormLiangMS Can you please try to cherry pick now since the dependent PR is merged?

mssonicbld pushed a commit to mssonicbld/sonic-platform-common that referenced this pull request May 9, 2024
…onic-net#462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #468

mssonicbld pushed a commit that referenced this pull request May 9, 2024
…462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
@StormLiangMS
Copy link

could you update the test result for 202311 @mihirpat1 ?

@mihirpat1
Copy link
Contributor Author

could you update the test result for 202311 @mihirpat1 ?

@StormLiangMS I have now tested this on 202311. Can you please help to merge this?

mssonicbld pushed a commit to mssonicbld/sonic-platform-common that referenced this pull request Jun 24, 2024
…onic-net#462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
mssonicbld pushed a commit to mssonicbld/sonic-platform-common that referenced this pull request Jun 24, 2024
…onic-net#462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #477

mssonicbld pushed a commit to mssonicbld/sonic-platform-common that referenced this pull request Jun 24, 2024
…onic-net#462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
mssonicbld pushed a commit that referenced this pull request Jun 25, 2024
…462)

* Handle page select and remote access check after changing SFP target

Signed-off-by: Mihir Patel <[email protected]>

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-platform-common that referenced this pull request Oct 25, 2024
…mpty slot issue (sonic-net#462)

* [chassis][linecard] Fix Module LINECARD<> went off-line message for empty slot issue

Signed-off-by: mlok <[email protected]>

* Define/use get_module_current_status()

---------

Signed-off-by: mlok <[email protected]>
Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants