Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202405] delayed ssh service start due to dependency caused by banner-config service #19661

Closed
anamehra opened this issue Jul 23, 2024 · 19 comments · Fixed by #21264
Closed

[202405] delayed ssh service start due to dependency caused by banner-config service #19661

anamehra opened this issue Jul 23, 2024 · 19 comments · Fixed by #21264
Assignees
Labels
Issue for 202405 NVIDIA Triaged this issue has been triaged

Comments

@anamehra
Copy link
Contributor

Description

Some Sonic-mgmt tests fail after RP reboot reporting host unreachable.

On debugging, it is observed that ssh service takes more time on 202405 image to start compared to 202305 image.

There is a new service introduced in 202405, banner-config.service.
This service is setup as:

Requires=config-setup.service
After=config-setup.service
Before=systemd-logind.service
Before=sshd.service

config-setup.service itself has dependency on database service. This pushes ssh service to start after config-setup and banner-config.
Without banner-config service, ssh had no dependency on database and used to start before database service.

Steps to reproduce the issue:

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@anamehra
Copy link
Contributor Author

Hi @rlhui , @abdosi , for your viz. Thanks

@anamehra
Copy link
Contributor Author

@SviatoslavBoichuk , please suggest on this. Thanks

@qiluo-msft
Copy link
Collaborator

@liat-grozovik @SviatoslavBoichuk Could you help troubleshoot this issue.

@judyjoseph judyjoseph added the Triaged this issue has been triaged label Jul 31, 2024
@abdosi
Copy link
Contributor

abdosi commented Aug 28, 2024

@abdosi : will bring this to notice to Nvidia.

@anamehra : to add data point how much delay increased from 202205 to 202405.

@anamehra
Copy link
Contributor Author

anamehra commented Sep 1, 2024

@anamehra : to add data point how much delay increased from 202205 to 202405.

Hi @abdosi , If I remove banner-config.service, I observed that on chassis RP,
ssh service started ~2 mins before database service.

root@sfd-t2-sup:/home/cisco# systemctl status ssh| grep Active
     Active: active (running) since Sun 2024-09-01 06:50:23 UTC; 33min ago
root@sfd-t2-sup:/home/cisco# systemctl status database| grep Active
     Active: active (running) since Sun 2024-09-01 06:52:22 UTC; 31min ago

With banner-config.service, it started 51sec after database service:

root@sfd-t2-sup:/home/cisco# systemctl status database| grep Active
     Active: active (running) since Sat 2024-08-31 18:36:50 UTC; 12h ago
root@sfd-t2-sup:/home/cisco# systemctl status ssh| grep Active
     Active: active (running) since Sat 2024-08-31 18:37:41 UTC; 12h ago

So, thats a ~2min 50 sec delay introduced on chassis RP by banner-config

@abdosi
Copy link
Contributor

abdosi commented Sep 26, 2024

@mlok-nokia @kenneth-arista : Are you seeing similar issue ?

@abdosi
Copy link
Contributor

abdosi commented Sep 26, 2024

@qiluo-msft : Can you check this ?

@abdosi
Copy link
Contributor

abdosi commented Oct 9, 2024

@mlok-nokia @arlakshm

@arista-nwolfe
Copy link
Contributor

arista-nwolfe commented Oct 10, 2024

Here is the data from Arista on 202405, the ssh service is only started 5s after the database service with banner-config.service and it's started 7s before database service without banner-config.service
With banner-config.service:

root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 08:17:01 UTC; 10h ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 08:16:56 UTC; 10h ago

Without banner-config.service:

root@cmp214-5:~# rm /run/systemd/generator/sonic.target.wants/banner-config.service
root@cmp214-5:~# rm /usr/lib/systemd/system/banner-config.service
root@cmp214-5:~# reboot
root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:26 UTC; 1h 14min ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:33 UTC; 1h 14min ago

@anamehra
Copy link
Contributor Author

Here is the data from Arista on 202405, the ssh service is only started 5s after the database service with banner-config.service and it's started 7s before database service without banner-config.service With banner-config.service:

root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 08:17:01 UTC; 10h ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 08:16:56 UTC; 10h ago

Without banner-config.service:

root@cmp214-5:~# rm /run/systemd/generator/sonic.target.wants/banner-config.service
root@cmp214-5:~# rm /usr/lib/systemd/system/banner-config.service
root@cmp214-5:~# reboot
root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:26 UTC; 1h 14min ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:33 UTC; 1h 14min ago

Hi @arista-nwolfe , is this data from Chassis supervisor? If not, could you please compare on Chassis supervisor? Thanks

@arista-nwolfe
Copy link
Contributor

Here is the data from Arista on 202405, the ssh service is only started 5s after the database service with banner-config.service and it's started 7s before database service without banner-config.service With banner-config.service:

root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 08:17:01 UTC; 10h ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 08:16:56 UTC; 10h ago

Without banner-config.service:

root@cmp214-5:~# rm /run/systemd/generator/sonic.target.wants/banner-config.service
root@cmp214-5:~# rm /usr/lib/systemd/system/banner-config.service
root@cmp214-5:~# reboot
root@cmp214-5:~# systemctl status ssh| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:26 UTC; 1h 14min ago
root@cmp214-5:~# systemctl status database| grep Active
     Active: active (running) since Thu 2024-10-10 18:56:33 UTC; 1h 14min ago

Hi @arista-nwolfe , is this data from Chassis supervisor? If not, could you please compare on Chassis supervisor? Thanks

Hi @anamehra sorry I was running that on the linecard.
When I run it on the supervisor I see numbers closer to what you're seeing.
With banner-config.service:

root@cmp214:~# systemctl status ssh| grep Active
     Active: active (running) since Tue 2024-10-15 21:12:31 UTC; 12min ago
root@cmp214:~# systemctl status database| grep Active
     Active: active (running) since Tue 2024-10-15 21:11:56 UTC; 12min ago

Without banner-config.service:

root@cmp214:~# systemctl status ssh| grep Active
     Active: active (running) since Tue 2024-10-15 21:29:28 UTC; 1h 11min ago
root@cmp214:~# systemctl status database| grep Active
     Active: active (running) since Tue 2024-10-15 21:29:39 UTC; 1h 11min ago

@rlhui
Copy link
Contributor

rlhui commented Nov 6, 2024

@bingwang-ms - can we bring this issue up with nvidia? It looks a new service ( banner-config.service.) introduced by Nvidia affected timing and caused some tests failed. @liat-grozovik @SviatoslavBoichuk

@bingwang-ms
Copy link
Contributor

@volodymyrsamotiy Can you help find the right owner for this issue? We will discuss it in the next meeting.

@bingwang-ms
Copy link
Contributor

Discussed in issue triage meeting. ETA of fix is 11/25.
@fastiuk , @abdosi FYI.

@yejianquan
Copy link
Contributor

Hi @fastiuk, kindly update the progress on this, thanks

@bingwang-ms
Copy link
Contributor

Discussed in weekly issue triage meeting, PR will be raised soon by @fastiuk

@fastiuk
Copy link
Contributor

fastiuk commented Dec 23, 2024

Discussed in the weekly issue triage meeting, PR will be raised soon by @fastiuk

PR with the fix is here: #21264

@volodymyrsamotiy FYI

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Dec 23, 2024

@mlok-nokia @kenneth-arista : Are you seeing similar issue ?

Somehow, Our OC run is ok. Not sure if we have delayed the OC testing in our private branch. But we saw the sshd started 2min28 seconds late after all databases started with 202405 image.

  1. Image 202405
admin@ixre-cpm-chassis7:~$ sudo systemctl status sshd | grep active
     Active: active (running) since Sat 2024-12-21 02:39:32 UTC; 2 days ago
             ??1933760 grep active
admin@ixre-cpm-chassis7:~$ sudo systemctl status database@8 | grep active
     Active: active (running) since Sat 2024-12-21 02:37:04 UTC; 2 days ago
  1. Image 202205, sshd started 1min 10 seconds before databases start,
admin@ixre-cpm-chassis16:~$ sudo systemctl status database@10 | grep active
     Active: active (running) since Sat 2024-11-23 15:52:50 UTC; 4 weeks 1 days ago
admin@ixre-cpm-chassis16:~$ sudo systemctl status sshd.service | grep active
     Active: active (running) since Sat 2024-11-23 15:51:43 UTC; 4 weeks 1 days ago
             ??1954793 grep active

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Dec 24, 2024
- Why I did it
Fixes sonic-net#19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
mssonicbld pushed a commit that referenced this issue Dec 25, 2024
- Why I did it
Fixes #19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this issue Jan 2, 2025
- Why I did it
Fixes sonic-net#19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this issue Jan 2, 2025
- Why I did it
Fixes sonic-net#19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this issue Jan 2, 2025
- Why I did it
Fixes sonic-net#19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this issue Jan 21, 2025
- Why I did it
Fixes sonic-net#19661

- How I did it
Removed the dependency on sshd service. Banner service will save the config data into the file after setting the config, so it will be available after the boot in any case

- How to verify it
Reboot the switch and wait for the serial prompt. SSH should become available after a couple of seconds.

Signed-off-by: Yevhen Fastiuk <[email protected]>
@yejianquan
Copy link
Contributor

yejianquan commented Feb 13, 2025

On Cisco chassis, the sshd delay issue resolved:
RP, ssh service up 2 mins before database:
Image

LC,ssh service up 1.5 mins before database:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202405 NVIDIA Triaged this issue has been triaged
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.