-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[muxorch] handling multiple mux nexthops for route #2656
Conversation
266395b
to
312f41a
Compare
Please provide the Description with all details as per the template |
Ah my bad, that usually auto generates with my commit message but I had a draft PR open before and it didn't update. I'll fix that |
01ee74d
to
adb5268
Compare
/azp run |
Commenter does not have sufficient privileges for PR 2656 in repo sonic-net/sonic-swss |
/azpw run |
/AzurePipelines run |
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
What I did: added tests/dualtor/test_multi_mux_nexthop_route.py which tests use case of multiple nexthop neighbors across different MUX ports. DEPENDS ON: sonic-net/sonic-swss#2656 Why I did it: test coverage of this scenario How I did it: test creates route to 2 different interfaces and neighbors, then validates that traffic is recieved from the expected ports.
What I did: added logic to handle when a route points to a nexthop group with mux neighbors. In this case, only one active neighbor, or the tunnel nexthop will be programmed to the ASIC. Why I did it: having a route with multiple mux neighbors caused a data loop which lead to packet loss when different neighbors were in different states. How I did it: added logic to update routes when a route is changed, a mux neighbor is changed, or there is a mux state change. HLD: sonic-net/SONiC#1256 Signed-off-by: Nikola Dancejic <[email protected]>
Signed-off-by: Nikola Dancejic <[email protected]>
Signed-off-by: Nikola Dancejic <[email protected]>
orchagent/muxorch.cpp
Outdated
if (it != mux_multi_nh_route_tb.end()) | ||
{ | ||
MuxCable* cable = findMuxCableInSubnet(it->second.ip_address); | ||
if (cable == nullptr || cable->isActive()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, if our route nexthop is not one of the configured mux cable IPs, we take no action. This seems a bit optimistic to me, unless we can 100% guarantee that the multi-mux-nexthop scenario only occurs with nexthop IPs that are configured as mux cable IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding from discussions with Prince we currently handle neighbors on the Vlan that aren't tied to a muxcable as active. maybe we can have a discussion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think findMuxCableInSubnet
returns if its directly connected server IP but not all learned neighbors (and is in config_db). But i think we have to handle it if the nexthop belongs to any mux cable.
orchagent/muxorch.cpp
Outdated
{ | ||
NextHopKey nexthop = *it; | ||
MuxCable* cable = findMuxCableInSubnet(nexthop.ip_address); | ||
if (cable == nullptr || (cable->isActive() && !active_found)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment - if the nexthop IP isn't a mux cable IP, it seems safer to assume it's standby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
# move neighbor 2 back | ||
self.add_neighbor(dvs, neighbors[0], macs[i]) | ||
|
||
self.del_route(dvs, route) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer if this test case were parameterized so that we could use a fixture to do setup/teardown in case something goes wrong in the middle of the test. Depending on urgency, we can merge as-is and discuss this change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree with this. currently I think the try,finally block will teardown resources that I'm using in this test, but I can definitely rework it in that way.
orchagent/routeorch.cpp
Outdated
/* Check if nexthop is mux nexthop */ | ||
MuxOrch* mux_orch = gDirectory.get<MuxOrch*>(); | ||
NextHopGroupKey nhg_key; | ||
if (inNextHopGroup(nextHop, nhg_key) && mux_orch->isMuxNexthops(nhg_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a nexthop is part of NHG but also another route points to this single nexthop?
For e.g like NH1 below:
R1 -> NHG1 (NH1, NH2)
R2 -> NH1
orchagent/muxorch.cpp
Outdated
if (it != mux_multi_nh_route_tb.end()) | ||
{ | ||
MuxCable* cable = findMuxCableInSubnet(it->second.ip_address); | ||
if (cable == nullptr || cable->isActive()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think findMuxCableInSubnet
returns if its directly connected server IP but not all learned neighbors (and is in config_db). But i think we have to handle it if the nexthop belongs to any mux cable.
Signed-off-by: Nikola Dancejic <[email protected]>
@Ndancejic cherry pick conflict, to 202211, could you raise separate PR for 202211 branch? @prsunny for vis. |
* [muxorch] handling multiple mux nexthops for route What I did: added logic to handle when a route points to a nexthop group with mux neighbors. In this case, only one active neighbor, or the tunnel nexthop will be programmed to the ASIC. Why I did it: having a route with multiple mux neighbors caused a data loop which lead to packet loss when different neighbors were in different states. How I did it: added logic to update routes when a route is changed, a mux neighbor is changed, or there is a mux state change. HLD: sonic-net/SONiC#1256 Signed-off-by: Nikola Dancejic <[email protected]>
What I did: added logic to handle when a route points to a nexthop
group with mux neighbors. In this case, only one active neighbor, or the
tunnel nexthop will be programmed to the ASIC.
Why I did it: having a route with multiple mux neighbors caused a data
loop which lead to packet loss when different neighbors were in
different states.
How I did it: added logic to update routes when a route is changed, a
mux neighbor is changed, or there is a mux state change.
HLD: sonic-net/SONiC#1256
Signed-off-by: Nikola Dancejic [email protected]