Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding BFD enhancements #3385

Closed

Conversation

SumitAgarwal123
Copy link

- What I did
Enhancement as in HLD below
https://github.com/Azure/SONiC/blob/master/bfd/BFD_Enhancement_HLD.md

- How I did it
Modified existing code

- How to verify it
Added Python scripts to test all the unit test case mentioned in HLD.
Verification logs attached.
results_tests_ut_protocols_bfd_test_ut_bfd_mhop.log.txt
results_tests_ut_protocols_bfd_test_ut_bfd.log.txt
results_tests_ut_protocols_bfd_test_ut_bfd_trunk_lag.log.txt

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@SumitAgarwal123
Copy link
Author

Retest this please

diff --git a/bfdd/bfd.c b/bfdd/bfd.c
index fc2d0bea9..abbd0e6a7 100644
--- a/bfdd/bfd.c
+++ b/bfdd/bfd.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch is large, @pavel-shirshov , is this really the way we want to manage frr sonic patch? or, we should take this patch in sonic-frr repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'd prefer to have them as individual patches.
In this case, though, I'd try to push the changes into FRR itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a patch anymore. Non bug fixes, must be upstreamed to frr first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All - I think our position here is that if we are required to upstream to FRR first, then this contribution will miss the current cycle (201908). This is because: -

  1. We likely won't have time to get it through the FRR community in time
  2. Doing so may require us to advance the FRR in SONiC

To avoid this we should proceed with the patch as presented, and upstream to FRR in parallel. We will then converge in a future FRR release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid 1500+ lines of functional code changes being treated as a patch. Any change by frr or sonic that interacts with that patch/file, or even changes bfd functionality, will create issues since people (in both frr and sonic) will be unaware of these changes.

Have you at least opened up a PR in frr repo for these changes to get them upstreamed? What's the PR for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the frr patch lack of description, need to add description, what are the enhancements. Without description, it is hard to know what enhancements are. Then, we should consider add to sonic-frr 7.0.1 branch as a commits. @pavel-shirshov . Meanwhile, I agree with @nikos-github , we need at least open PR for frr and check the response from them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan as soon as we integrate such patches as part of our frr we're going to stuck with one version of frr. Do we want that? Also as soon as we push the patch to our branch it's going to be our responsibility of fixing issues.

@ben-gale why your patch is critical for Aug branch of sonic? Will aug branch work without it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavel-shirshov The patch is needed for us to contribute the BFD feature to the SONiC community. Without it, no feature .....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavel-shirshov , i think broadcom will maintain the patch. I do not think it will us to fix those issues w.r.t to BFD. @ben-gale, correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

@lguohan
Copy link
Collaborator

lguohan commented Aug 29, 2019

conflict for bgpcfgd code since there is a PR to refactor the code.

@@ -0,0 +1,1562 @@
diff --git a/bfdd/bfd.c b/bfdd/bfd.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this patch upstreamed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have submitted Pull request in FRR to upstream these chages.

diff --git a/bfdd/bfd.c b/bfdd/bfd.c
index fc2d0bea9..abbd0e6a7 100644
--- a/bfdd/bfd.c
+++ b/bfdd/bfd.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a patch anymore. Non bug fixes, must be upstreamed to frr first.

Copy link
Collaborator

@stephenxs stephenxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see in /usr/bing/start.sh that for separated mode it makes use of bgpd.conf.j2 to generate the configuration. So I suggest updating bgpd.conf.j2 as well.
Similarly, src/sonic-config-engie/minigraph.py is supposed to be changed in order to support loading configuration from mini-graph,

@stephenxs
Copy link
Collaborator

stephenxs commented Oct 27, 2019

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:
- adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.
- use cgroup to reserve some CPU time for the BFD daemon.
- adjust nice value.
BTW, I prefer the first option among all the three.

@stephenxs
Copy link
Collaborator

stephenxs commented Oct 27, 2019

It is also risky to have a BFD neighbor in a LAG interface or mhop BFD neighbor in an ECMP routing entry.
Consider the following scenario:

1. One of the member interfaces was down. When it's up the LACP starts to negotiate with the peer.
2. After negotiation done, the event has to passed from teamd to orchagent and then to syncd and eventually switch CHIP is programmed by syncd. Only after that does the member interface really work.
3. However, on SONiC it can take seconds to reach that point especially when some other events are pending in the orchagent's queue (like routing updates). 

In this case, if the member interface on the BFD neighbor switch really works in a shorter time period and the BFD neighbor happens to designate the member interface which is just up as the one from which the BFD protocol should be sent, the SONiC switch is very likely to drop the BFD packets until the member interface really works and identifies the BFD neighbor as down.
Currently, it can be resolved by extending the BFD's detecting time to the maximum it can take to program the switch CHIP. But I'm afraid it doesn't make sense because that time should be a few seconds which is much longer than current BGP's detecting time which is 3 seconds.
It is similar for mhop BFD with ECMP routing entry.

Copy link
Collaborator

@stephenxs stephenxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented:

  1. suggest updating bgpd.conf.j2 which is used in separated configuring mode.
  2. suggest supplying test cases in sonic-mgmt correspondingly.
  3. suggest adopting proper scheduling policy on BFD daemon, like SCHED_FIFO.
  4. suggest fixing the issue comming along with new VM supporting BFD before BFD is merged.

@stephenxs
Copy link
Collaborator

stephenxs commented Oct 29, 2019

BTW I find that the BFD is not supported on the current version of VM we're using (4.15.10M). So far the version I found with BFD support is 4.22.0F.
But there is an issue while using that version: the output of "show version" on that version doesn't contain the string "Arista" which is used to determine what kind of host it is. (See sonic-mgmt//ansible/plugins/connection/switch.py#L101. Without the "Arista" in "show version" some testcases can fail, like the following:

03:44:48 An exception occurred during task execution. The full traceback is:
03:44:48 Traceback (most recent call last):
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/process/worker.py", line 114, in run
03:44:48     self._shared_loader_obj,
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/task_executor.py", line 119, in run
03:44:48     res = self._execute()
03:44:48   File "/usr/local/lib/python2.7/dist-packages/ansible-2.0.0.2-py2.7.egg/ansible/executor/task_executor.py", line 401, in _execute
03:44:48     result = self._handler.run(task_vars=variables)
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/action/apswitch.py", line 61, in run
03:44:48     timeout=_timeout)
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/connection/switch.py", line 192, in exec_command
03:44:48     client = self._spawn_connect()
03:44:48   File "/builds2/jenkins2/workspace/SONiC_Test-LAG2-minlink/ansible/plugins/connection/switch.py", line 110, in _spawn_connect
03:44:48     if self.sku == 'mlnx_os':
03:44:48 AttributeError: 'Connection' object has no attribute 'sku'

In this sense, I suggest also have that issue fixed before merging BFD.
I raised an issue for it.

@SumitAgarwal123
Copy link
Author

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

  • adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.
  • use cgroup to reserve some CPU time for the BFD daemon.
  • adjust nice value.
    BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.

Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

@stephenxs
Copy link
Collaborator

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

  • adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.
  • use cgroup to reserve some CPU time for the BFD daemon.
  • adjust nice value.
    BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.

Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

Hi,
Yes, by putting the BFD packets into a priority queue the DUT can receive them even though many packets are pouring into CPU.
However, this is only the first step of BFD packets handling. After that the packets will traverse from thread softirq to main thread of bfdd if no more other threads between them. From the OS's perspective of view, there is a scheduling latency which is the latency from the time when the thread enters the "ready" state to the time when it is really scheduled. If the system is busy the latency can be large and the BFD can timeout. And it's similar for the sending side. The BFD can fail to send packet on time when the system is busy.
I know the probability that BFD timeout due to system busy is very slim but I think it's better to take steps to make sure BFD works well even the system is busy.

@SumitAgarwal123
Copy link
Author

Scheduling issues can also impact BFD, causing it to raise a false alarm.
If the BFD daemon detects three consecutive echo packets lost it will identify the peer as down. Typically this is caused by issues of peer or the network. But sometimes it can also be caused by scheduling issues. For example, BFD daemon isn't scheduled for multiply*rx-interval milliseconds. In this case, even the packets arrive on time the daemon doesn't have the opportunity to handle and identifies the peer as down due to timeout. Neither can it send BFD packets on time, which causes the peer to identify the DUT as "down". This kind of neighbor down is false alarm.
Typically this is caused by some other task hogging CPU (failing to relinquish CPU for a long time) or system busy. In this sense, any protocol who has a keepalive mechanism with timeout can be impacted. Among these protocols, BFD has the highest probability to be impacted because it has the shortest detecting time of 300 ms. In addition, once a BFD neighbor is down it will notify related routing protocols and the latter will withdraw routing entries learnt from the down neighbor, which can lead to another round of CPU busy thus forming chained reactions.
So it makes sense to ensure BFD has the necessary CPU resource. IMO this can be achieved in several ways, like:

  • adopting FIFO scheduling policy and setting proper priority for threads handling BFD related stuff.
  • use cgroup to reserve some CPU time for the BFD daemon.
  • adjust nice value.
    BTW, I prefer the first option among all the three.

Refer Section 7 of HLD, BFD packet trapping to CPU.
BFD packet can be put in priority Queue.
Further we have done extensive testing in scaled setup, we haven't come across a situation when BFD did not get enough CPU resources.

Hi,
Yes, by putting the BFD packets into a priority queue the DUT can receive them even though many packets are pouring into CPU.
However, this is only the first step of BFD packets handling. After that the packets will traverse from thread softirq to main thread of bfdd if no more other threads between them. From the OS's perspective of view, there is a scheduling latency which is the latency from the time when the thread enters the "ready" state to the time when it is really scheduled. If the system is busy the latency can be large and the BFD can timeout. And it's similar for the sending side. The BFD can fail to send packet on time when the system is busy.
I know the probability that BFD timeout due to system busy is very slim but I think it's better to take steps to make sure BFD works well even the system is busy.

Thanks for the inputs and we can look into these as separate BFD enhancements

@stephenxs
Copy link
Collaborator

retest this, please.

mssonicbld added a commit that referenced this pull request Jul 19, 2024
…atically (#19632)

#### Why I did it
src/sonic-utilities
```
* 2ffd0c96 - (HEAD -> 202405, origin/202405) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (#3385) (10 minutes ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Jul 19, 2024
…atically (#19627)

#### Why I did it
src/sonic-utilities
```
* f2b76213 - (HEAD -> master, origin/master, origin/HEAD) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (#3385) (4 hours ago) [mihirpat1]
* fd3096c7 - Enable show ip bgp on sup and -n all for show ip bgp network (#3417) (9 hours ago) [Changrong Wu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
…atically (sonic-net#19627)

#### Why I did it
src/sonic-utilities
```
* f2b76213 - (HEAD -> master, origin/master, origin/HEAD) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (sonic-net#3385) (4 hours ago) [mihirpat1]
* fd3096c7 - Enable show ip bgp on sup and -n all for show ip bgp network (sonic-net#3417) (9 hours ago) [Changrong Wu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Aug 1, 2024
…atically (sonic-net#19627)

#### Why I did it
src/sonic-utilities
```
* f2b76213 - (HEAD -> master, origin/master, origin/HEAD) [SfpUtil] sfp eeprom with option dom is not working on Xcvrs with flat memory (sonic-net#3385) (4 hours ago) [mihirpat1]
* fd3096c7 - Enable show ip bgp on sup and -n all for show ip bgp network (sonic-net#3417) (9 hours ago) [Changrong Wu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants