Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AS7816-64X syncd Crashed on latest due to Dynamic Breakout Port Feature commit #3508

Open
pollyhsu2git opened this issue Sep 25, 2019 · 0 comments

Comments

@pollyhsu2git
Copy link
Contributor

pollyhsu2git commented Sep 25, 2019

Description

  1. The customer reported that AS7816-64X syncd crashed upon the ONIE-installation on its master 20190917 pull.

  2. After troubleshooting on the latest (Jenkins#86 dated 2019-09-21), the models with the loopback / management port SDK configuration will hit the SONiC orchagent crash issue due to [Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform #3240 merged 2019-08-08.

  3. After the discussion with our channel with Broadcom, we are here to high-light it to the SONiC community as [Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform #3240 is affecting all the Broadcom device vendor models with the loopback / management port SDK configuration.

Steps to reproduce the issue:

  1. Downloaded the SONiC latest built image, SONiC.HEAD.86-dirty-20190921.062232, which we got it from the SONiC Jenkins daily build system (https://sonic-jenkins.westus2.cloudapp.azure.com/job/broadcom/job/buildimage-brcm-all/).

  2. Examined the syncd status upon booted after the ONIE installation by "bcmcmd ps", and syncd exited for some reasons.

root@sonic:/home/admin# docker ps
CONTAINER ID        IMAGE                             COMMAND                  CREATED              STATUS              PORTS               NAMES
ee7c57b3f430        docker-dhcp-relay:latest          "/usr/bin/docker_ini"   About a minute ago   Up 10 seconds                           dhcp_relay
8b0888a73480        docker-syncd-brcm:latest          "/usr/bin/supervisord"   About a minute ago   Up 12 seconds                           syncd
81b5a36e9a6c        docker-sflow:latest               "/usr/bin/supervisord"   About a minute ago   Up 14 seconds                           sflow
70c20d77fbce        docker-teamd:latest               "/usr/bin/supervisord"   About a minute ago   Up 14 seconds                           teamd
20fd7f548768        docker-router-advertiser:latest   "/usr/bin/supervisord"   About a minute ago   Up 8 seconds                            radv
83c9032c9036        docker-snmp-sv2:latest            "/usr/bin/supervisord"   About a minute ago   Up 14 seconds                           snmp
ca041451178f        docker-lldp-sv2:latest            "/usr/bin/supervisord"   2 minutes ago        Up 2 minutes                            lldp
8f20dc00b134        docker-platform-monitor:latest    "/usr/bin/docker_ini"   2 minutes ago        Up 2 minutes                            pmon
fbf8fa390628        docker-fpm-frr:latest             "/usr/bin/supervisord"   2 minutes ago        Up 2 minutes                            bgp
c5f864a9a4b1        docker-sonic-telemetry:latest     "/usr/bin/supervisord"   2 minutes ago        Up 2 minutes                            telemetry
093e6c6a33ca        docker-database:latest            "/usr/local/bin/dock"   2 minutes ago        Up 2 minutes                            database
root@sonic:/home/admin# docker bcmcmd ps
Error response from daemon: Container 8b0888a734801e5dcb0aafe94a357b6b13d8c8f633b4d2d3f505b0095f84687d is not running

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

  1. Upon examining the syslog, the syncd exited due to the port operation (putty-7816-sonic-broadcom-j_master-n86-2019_09_21.log)
Nov  3 17:32:26.183742 sonic NOTICE syncd#syncd: :- syncd_main: is asic queue empty: 0
Nov  3 17:32:26.183742 sonic ERR syncd#syncd: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000033, status: SAI_STATUS_NOT_SUPPORTED
Nov  3 17:32:26.183742 sonic ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: remove, key: SAI_OBJECT_TYPE_PORT:oid:0x1000000000033, status: SAI_STATUS_NOT_SUPPORTED
Nov  3 17:32:26.183742 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA
Nov  3 17:32:26.183882 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: notification send successfull
Nov  3 17:32:26.183882 sonic NOTICE syncd#syncd: :- syncd_main: Removing the switch gSwitchId=0xb970012100000000
Nov  3 17:32:26.184036 sonic INFO syncd.sh[22568]: requested COLD shutdown
Nov  3 17:32:28.759479 sonic INFO syncd#supervisord: syncd 0:soc_shutdown: soc_shutdown: all units detached#015
Nov  3 17:32:28.765808 sonic NOTICE syncd#syncd: :- syncd_main: remove switch took 1.666283 sec
Nov  3 17:32:28.765808 sonic NOTICE syncd#syncd: :- syncd_main: calling api uninitialize
Nov  3 17:32:28.765808 sonic NOTICE syncd#syncd: :- syncd_main: uninitialize finished
Nov  3 17:32:28.923618 sonic NOTICE syncd#syncd: :- threadFunction: ending timer watchdog thread
Nov  3 17:32:28.950394 sonic NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 0
  1. After examining the SONiC Issue reports, and the latest commits, and found that [Juniper][QFX5210] Workaround for orchagent crash #3458 reported the issue of “Orchagent is crashing on the latest SONiC images” introduced by [Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform #3240 (Feature: DynamicPortBreakout] Use consolidated bcm file for Seastone platform).

  2. After testing the work-around of [Juniper][QFX5210] Workaround for orchagent crash #3458 on the SONiC latest buld image (SONiC.HEAD.86-dirty-20190921.062232) by commenting out all the loopback / management port SDK configuration, the syncd came back and all AS7816-64X ports are up (putty-7816-sonic-broadcom-j_master-n86-2019_09_21.log)

3.1). Comment out the loopback port / management port

$ vi /usr/share/sonic/device/x86_64-accton_as7816_64x-r0/Accton-AS7816-64X/th2-as7816-64x100G.config.bcm
-------------------------------------
#add loopback port
# port 33 is the first loopback port
# portmap_33=260:10
# port 66 is the first management port
# portmap_66=257:10
# port 67 is the second loopback port
# portmap_67=261:10
# port 100 is the second management port
# portmap_100=259:10
# port 101 is the third loopback port
# portmap_101=262:10
# port 135 is the fourth loopback port
# portmap_135=263:10
-----------------------------------------------

3.2). Reboot the system

$ sudo reboot

**Output of `show version`:**
root@sonic:/home/admin# show platform summary 
Platform: x86_64-accton_as7816_64x-r0
HwSKU: Accton-AS7816-64X
ASIC: broadcom
root@sonic:/home/admin# show ver

SONiC Software Version: SONiC.HEAD.86-dirty-20190921.062232
Distribution: Debian 9.11
Kernel: 4.9.0-9-2-amd64
Build commit: c60278de
Build date: Sat Sep 21 06:35:09 UTC 2019
Built by: johnar@jenkins-worker-4

Platform: x86_64-accton_as7816_64x-r0
HwSKU: Accton-AS7816-64X
ASIC: broadcom
Serial Number: 781664X0000000
Uptime: 17:55:37 up 1 min,  1 user,  load average: 0.52, 0.19, 0.07
```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```

putty-7816-sonic-broadcom-j_master-n86-2019_09_21.log

@pollyhsu2git pollyhsu2git changed the title AS7816-64X syncd Crashed due to Dynamic Breakout Port Feature commit AS7816-64X syncd Crashed on latest due to Dynamic Breakout Port Feature commit Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant