Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[muxorch] handling multiple mux nexthops for route #2656

Merged
merged 5 commits into from
Apr 10, 2023

Conversation

Ndancejic
Copy link
Contributor

@Ndancejic Ndancejic commented Feb 7, 2023

What I did: added logic to handle when a route points to a nexthop
group with mux neighbors. In this case, only one active neighbor, or the
tunnel nexthop will be programmed to the ASIC.

Why I did it: having a route with multiple mux neighbors caused a data
loop which lead to packet loss when different neighbors were in
different states.

How I did it: added logic to update routes when a route is changed, a
mux neighbor is changed, or there is a mux state change.

HLD: sonic-net/SONiC#1256

Signed-off-by: Nikola Dancejic [email protected]

@Ndancejic Ndancejic force-pushed the multi-nexthop branch 2 times, most recently from 266395b to 312f41a Compare February 14, 2023 23:20
@Ndancejic Ndancejic changed the title multi-nexthop routes draft [muxorch] handling multiple mux nexthops for route Feb 14, 2023
@Ndancejic Ndancejic marked this pull request as ready for review February 14, 2023 23:20
@Ndancejic Ndancejic requested a review from prsunny as a code owner February 14, 2023 23:20
@prsunny
Copy link
Collaborator

prsunny commented Feb 15, 2023

Please provide the Description with all details as per the template

@Ndancejic
Copy link
Contributor Author

Please provide the Description with all details as per the template

Ah my bad, that usually auto generates with my commit message but I had a draft PR open before and it didn't update. I'll fix that

@Ndancejic Ndancejic force-pushed the multi-nexthop branch 3 times, most recently from 01ee74d to adb5268 Compare February 25, 2023 06:33
@Ndancejic
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 2656 in repo sonic-net/sonic-swss

@Ndancejic
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Ndancejic added a commit to Ndancejic/sonic-mgmt that referenced this pull request Feb 28, 2023
What I did: added tests/dualtor/test_multi_mux_nexthop_route.py which
tests use case of multiple nexthop neighbors across different MUX ports.

DEPENDS ON: sonic-net/sonic-swss#2656

Why I did it: test coverage of this scenario

How I did it: test creates route to 2 different interfaces and
neighbors, then validates that traffic is recieved from the expected
ports.
What I did: added logic to handle when a route points to a nexthop
group with mux neighbors. In this case, only one active neighbor, or the
tunnel nexthop will be programmed to the ASIC.

Why I did it: having a route with multiple mux neighbors caused a data
loop which lead to packet loss when different neighbors were in
different states.

How I did it: added logic to update routes when a route is changed, a
mux neighbor is changed, or there is a mux state change.

HLD: sonic-net/SONiC#1256

Signed-off-by: Nikola Dancejic <[email protected]>
@Ndancejic Ndancejic requested review from prsunny and removed request for theasianpianist March 7, 2023 23:32
if (it != mux_multi_nh_route_tb.end())
{
MuxCable* cable = findMuxCableInSubnet(it->second.ip_address);
if (cable == nullptr || cable->isActive())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, if our route nexthop is not one of the configured mux cable IPs, we take no action. This seems a bit optimistic to me, unless we can 100% guarantee that the multi-mux-nexthop scenario only occurs with nexthop IPs that are configured as mux cable IPs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding from discussions with Prince we currently handle neighbors on the Vlan that aren't tied to a muxcable as active. maybe we can have a discussion on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think findMuxCableInSubnet returns if its directly connected server IP but not all learned neighbors (and is in config_db). But i think we have to handle it if the nexthop belongs to any mux cable.

{
NextHopKey nexthop = *it;
MuxCable* cable = findMuxCableInSubnet(nexthop.ip_address);
if (cable == nullptr || (cable->isActive() && !active_found))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment - if the nexthop IP isn't a mux cable IP, it seems safer to assume it's standby

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

# move neighbor 2 back
self.add_neighbor(dvs, neighbors[0], macs[i])

self.del_route(dvs, route)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer if this test case were parameterized so that we could use a fixture to do setup/teardown in case something goes wrong in the middle of the test. Depending on urgency, we can merge as-is and discuss this change in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree with this. currently I think the try,finally block will teardown resources that I'm using in this test, but I can definitely rework it in that way.

/* Check if nexthop is mux nexthop */
MuxOrch* mux_orch = gDirectory.get<MuxOrch*>();
NextHopGroupKey nhg_key;
if (inNextHopGroup(nextHop, nhg_key) && mux_orch->isMuxNexthops(nhg_key))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a nexthop is part of NHG but also another route points to this single nexthop?
For e.g like NH1 below:

R1 -> NHG1 (NH1, NH2)
R2 -> NH1

if (it != mux_multi_nh_route_tb.end())
{
MuxCable* cable = findMuxCableInSubnet(it->second.ip_address);
if (cable == nullptr || cable->isActive())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think findMuxCableInSubnet returns if its directly connected server IP but not all learned neighbors (and is in config_db). But i think we have to handle it if the nexthop belongs to any mux cable.

Signed-off-by: Nikola Dancejic <[email protected]>
@prsunny prsunny merged commit 32ecc2a into sonic-net:master Apr 10, 2023
@StormLiangMS
Copy link
Contributor

@Ndancejic cherry pick conflict, to 202211, could you raise separate PR for 202211 branch?

@prsunny for vis.

yxieca pushed a commit that referenced this pull request Apr 19, 2023
* [muxorch] handling multiple mux nexthops for route

What I did: added logic to handle when a route points to a nexthop
group with mux neighbors. In this case, only one active neighbor, or the
tunnel nexthop will be programmed to the ASIC.

Why I did it: having a route with multiple mux neighbors caused a data
loop which lead to packet loss when different neighbors were in
different states.

How I did it: added logic to update routes when a route is changed, a
mux neighbor is changed, or there is a mux state change.

HLD: sonic-net/SONiC#1256

Signed-off-by: Nikola Dancejic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants