Adding BFD enhancements #3385

SumitAgarwal123 · 2019-08-27T03:09:31Z

- What I did
Enhancement as in HLD below
https://github.com/Azure/SONiC/blob/master/bfd/BFD_Enhancement_HLD.md

- How I did it
Modified existing code

- How to verify it
Added Python scripts to test all the unit test case mentioned in HLD.
Verification logs attached.
results_tests_ut_protocols_bfd_test_ut_bfd_mhop.log.txt
results_tests_ut_protocols_bfd_test_ut_bfd.log.txt
results_tests_ut_protocols_bfd_test_ut_bfd_trunk_lag.log.txt

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

SumitAgarwal123 · 2019-08-28T10:23:11Z

Retest this please

lguohan · 2019-08-29T05:17:31Z

src/sonic-frr/patch/0006-BFD-Enhancement.patch

+diff --git a/bfdd/bfd.c b/bfdd/bfd.c
+index fc2d0bea9..abbd0e6a7 100644
+--- a/bfdd/bfd.c
+++ b/bfdd/bfd.c


this patch is large, @pavel-shirshov , is this really the way we want to manage frr sonic patch? or, we should take this patch in sonic-frr repo?

Yes. I'd prefer to have them as individual patches.
In this case, though, I'd try to push the changes into FRR itself.

This is not a patch anymore. Non bug fixes, must be upstreamed to frr first.

All - I think our position here is that if we are required to upstream to FRR first, then this contribution will miss the current cycle (201908). This is because: -

We likely won't have time to get it through the FRR community in time

Doing so may require us to advance the FRR in SONiC

To avoid this we should proceed with the patch as presented, and upstream to FRR in parallel. We will then converge in a future FRR release.

We should avoid 1500+ lines of functional code changes being treated as a patch. Any change by frr or sonic that interacts with that patch/file, or even changes bfd functionality, will create issues since people (in both frr and sonic) will be unaware of these changes.

Have you at least opened up a PR in frr repo for these changes to get them upstreamed? What's the PR for it?

the frr patch lack of description, need to add description, what are the enhancements. Without description, it is hard to know what enhancements are. Then, we should consider add to sonic-frr 7.0.1 branch as a commits. @pavel-shirshov . Meanwhile, I agree with @nikos-github , we need at least open PR for frr and check the response from them.

@lguohan as soon as we integrate such patches as part of our frr we're going to stuck with one version of frr. Do we want that? Also as soon as we push the patch to our branch it's going to be our responsibility of fixing issues.

@ben-gale why your patch is critical for Aug branch of sonic? Will aug branch work without it?

@pavel-shirshov The patch is needed for us to contribute the BFD feature to the SONiC community. Without it, no feature .....

@pavel-shirshov , i think broadcom will maintain the patch. I do not think it will us to fix those issues w.r.t to BFD. @ben-gale, correct?

lguohan · 2019-08-29T23:34:36Z

conflict for bgpcfgd code since there is a PR to refactor the code.

lguohan · 2019-08-29T23:39:06Z

src/sonic-frr/patch/0006-BFD-Enhancement.patch

@@ -0,0 +1,1562 @@
+diff --git a/bfdd/bfd.c b/bfdd/bfd.c


is this patch upstreamed?

I have submitted Pull request in FRR to upstream these chages.

nikos-github · 2019-08-30T04:29:15Z

src/sonic-frr/patch/0006-BFD-Enhancement.patch

+diff --git a/bfdd/bfd.c b/bfdd/bfd.c
+index fc2d0bea9..abbd0e6a7 100644
+--- a/bfdd/bfd.c
+++ b/bfdd/bfd.c


This is not a patch anymore. Non bug fixes, must be upstreamed to frr first.

stephenxs

We can see in /usr/bing/start.sh that for separated mode it makes use of bgpd.conf.j2 to generate the configuration. So I suggest updating bgpd.conf.j2 as well.
Similarly, src/sonic-config-engie/minigraph.py is supposed to be changed in order to support loading configuration from mini-graph,

stephenxs · 2019-10-27T05:32:55Z

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:
- adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.
- use cgroup to reserve some CPU time for the BFD daemon.
- adjust nice value.
BTW, I prefer the first option among all the three.

stephenxs · 2019-10-27T05:44:55Z

It is also risky to have a BFD neighbor in a LAG interface or mhop BFD neighbor in an ECMP routing entry.
Consider the following scenario:

1. One of the member interfaces was down. When it's up the LACP starts to negotiate with the peer.
2. After negotiation done, the event has to passed from teamd to orchagent and then to syncd and eventually switch CHIP is programmed by syncd. Only after that does the member interface really work.
3. However, on SONiC it can take seconds to reach that point especially when some other events are pending in the orchagent's queue (like routing updates).

In this case, if the member interface on the BFD neighbor switch really works in a shorter time period and the BFD neighbor happens to designate the member interface which is just up as the one from which the BFD protocol should be sent, the SONiC switch is very likely to drop the BFD packets until the member interface really works and identifies the BFD neighbor as down.
Currently, it can be resolved by extending the BFD's detecting time to the maximum it can take to program the switch CHIP. But I'm afraid it doesn't make sense because that time should be a few seconds which is much longer than current BGP's detecting time which is 3 seconds.
It is similar for mhop BFD with ECMP routing entry.

stephenxs

As commented:

suggest updating bgpd.conf.j2 which is used in separated configuring mode.
suggest supplying test cases in sonic-mgmt correspondingly.
suggest adopting proper scheduling policy on BFD daemon, like SCHED_FIFO.
suggest fixing the issue comming along with new VM supporting BFD before BFD is merged.

stephenxs · 2019-10-29T08:15:38Z

BTW I find that the BFD is not supported on the current version of VM we're using (4.15.10M). So far the version I found with BFD support is 4.22.0F.
But there is an issue while using that version: the output of "show version" on that version doesn't contain the string "Arista" which is used to determine what kind of host it is. (See sonic-mgmt//ansible/plugins/connection/switch.py#L101. Without the "Arista" in "show version" some testcases can fail, like the following:

03:44:48 An exception occurred during task execution. The full traceback is:
03:44:48 Traceback (most recent call last):
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/process/worker.py", line 114, in run
03:44:48     self._shared_loader_obj,
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/task_executor.py", line 119, in run
03:44:48     res = self._execute()
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/task_executor.py", line 401, in _execute
03:44:48     result = self._handler.run(task_vars=variables)
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/action/apswitch.py", line 61, in run
03:44:48     timeout=_timeout)
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/connection/switch.py", line 192, in exec_command
03:44:48     client = self._spawn_connect()
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/connection/switch.py", line 110, in _spawn_connect
03:44:48     if self.sku == 'mlnx_os':
03:44:48 AttributeError: 'Connection' object has no attribute 'sku'

In this sense, I suggest also have that issue fixed before merging BFD.
I raised an issue for it.

SumitAgarwal123 · 2019-11-06T09:42:07Z

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.

use cgroup to reserve some CPU time for the BFD daemon.

adjust nice value.
BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.

Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

stephenxs · 2019-11-07T10:58:24Z

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.

use cgroup to reserve some CPU time for the BFD daemon.

adjust nice value.
BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.

Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

Hi,
Yes, by putting the BFD packets into a priority queue the DUT can receive them even though many packets are pouring into CPU.
However, this is only the first step of BFD packets handling. After that the packets will traverse from thread softirq to main thread of bfdd if no more other threads between them. From the OS's perspective of view, there is a scheduling latency which is the latency from the time when the thread enters the "ready" state to the time when it is really scheduled. If the system is busy the latency can be large and the BFD can timeout. And it's similar for the sending side. The BFD can fail to send packet on time when the system is busy.
I know the probability that BFD timeout due to system busy is very slim but I think it's better to take steps to make sure BFD works well even the system is busy.

SumitAgarwal123 · 2019-11-13T06:02:48Z

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.

use cgroup to reserve some CPU time for the BFD daemon.

adjust nice value.
BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.
Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

Hi,
Yes, by putting the BFD packets into a priority queue the DUT can receive them even though many packets are pouring into CPU.
However, this is only the first step of BFD packets handling. After that the packets will traverse from thread softirq to main thread of bfdd if no more other threads between them. From the OS's perspective of view, there is a scheduling latency which is the latency from the time when the thread enters the "ready" state to the time when it is really scheduled. If the system is busy the latency can be large and the BFD can timeout. And it's similar for the sending side. The BFD can fail to send packet on time when the system is busy.
I know the probability that BFD timeout due to system busy is very slim but I think it's better to take steps to make sure BFD works well even the system is busy.

Thanks for the inputs and we can look into these as separate BFD enhancements

stephenxs · 2019-11-30T13:16:58Z

retest this, please.

…atically (#19632) #### Why I did it src/sonic-utilities ``` * 2ffd0c96 - (HEAD -> 202405, origin/202405) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (#3385) (10 minutes ago) [mihirpat1] ``` #### How I did it #### How to verify it #### Description for the changelog

…atically (#19627) #### Why I did it src/sonic-utilities ``` * f2b76213 - (HEAD -> master, origin/master, origin/HEAD) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (#3385) (4 hours ago) [mihirpat1] * fd3096c7 - Enable show ip bgp on sup and -n all for show ip bgp network (#3417) (9 hours ago) [Changrong Wu] ``` #### How I did it #### How to verify it #### Description for the changelog

…atically (sonic-net#19627) #### Why I did it src/sonic-utilities ``` * f2b76213 - (HEAD -> master, origin/master, origin/HEAD) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (sonic-net#3385) (4 hours ago) [mihirpat1] * fd3096c7 - Enable show ip bgp on sup and -n all for show ip bgp network (sonic-net#3417) (9 hours ago) [Changrong Wu] ``` #### How I did it #### How to verify it #### Description for the changelog

lguohan reviewed Aug 29, 2019

View reviewed changes

nikos-github suggested changes Aug 30, 2019

View reviewed changes

pavel-shirshov self-assigned this Oct 15, 2019

pavel-shirshov added the Enhancement ➕ label Oct 15, 2019

stephenxs reviewed Oct 24, 2019

View reviewed changes

stephenxs suggested changes Oct 27, 2019

View reviewed changes

SumitAgarwal123 closed this Dec 3, 2019

SumitAgarwal123 force-pushed the BFD-Enhancement branch from e7eead4 to fc36ca6 Compare December 3, 2019 12:26

SumitAgarwal123 mentioned this pull request Dec 3, 2019

Adding BFD Enhancements #3838

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding BFD enhancements #3385

Adding BFD enhancements #3385

SumitAgarwal123 commented Aug 27, 2019

SumitAgarwal123 commented Aug 28, 2019

lguohan Aug 29, 2019

pavel-shirshov Aug 29, 2019

nikos-github Aug 30, 2019

ben-gale Sep 3, 2019

nikos-github Sep 3, 2019

lguohan Sep 3, 2019

pavel-shirshov Sep 3, 2019

ben-gale Sep 3, 2019

lguohan Sep 18, 2019

ben-gale Sep 19, 2019

lguohan commented Aug 29, 2019

lguohan Aug 29, 2019

SumitAgarwal123 Sep 24, 2019

nikos-github Aug 30, 2019

stephenxs left a comment •

edited

Loading

stephenxs commented Oct 27, 2019 •

edited

Loading

stephenxs commented Oct 27, 2019 •

edited

Loading

stephenxs left a comment •

edited

Loading

stephenxs commented Oct 29, 2019 •

edited

Loading

SumitAgarwal123 commented Nov 6, 2019

stephenxs commented Nov 7, 2019

SumitAgarwal123 commented Nov 13, 2019

stephenxs commented Nov 30, 2019

Adding BFD enhancements #3385

Adding BFD enhancements #3385

Conversation

SumitAgarwal123 commented Aug 27, 2019

SumitAgarwal123 commented Aug 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lguohan commented Aug 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenxs left a comment • edited Loading

Choose a reason for hiding this comment

stephenxs commented Oct 27, 2019 • edited Loading

stephenxs commented Oct 27, 2019 • edited Loading

stephenxs left a comment • edited Loading

Choose a reason for hiding this comment

stephenxs commented Oct 29, 2019 • edited Loading

SumitAgarwal123 commented Nov 6, 2019

stephenxs commented Nov 7, 2019

SumitAgarwal123 commented Nov 13, 2019

stephenxs commented Nov 30, 2019

stephenxs left a comment •

edited

Loading

stephenxs commented Oct 27, 2019 •

edited

Loading

stephenxs commented Oct 27, 2019 •

edited

Loading

stephenxs left a comment •

edited

Loading

stephenxs commented Oct 29, 2019 •

edited

Loading