Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for handling recursive NHG entries #1636

Merged
merged 7 commits into from
May 17, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 85 additions & 5 deletions doc/pic/hld_fpmsyncd.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@
| 0.3 | Sep 18, 2023 | Kentaro Ebisawa (NTT) | Update based on discussion at Routing WG on Sep 14th (Scope, Warmboot/Fastboot, CONFIG_DB) |
| 0.4 | Sep 24, 2023 | Kentaro Ebisawa (NTT) | Add feature enable/disable design and CLI. Update test plan. |
| 0.5 | Nov 10, 2023 | Kanji Nakano (NTT) | Update feature enable/disable design and CLI. Update test plan. |
| 0.6 | Feb 09, 2024 | Nikhil Kelapure (BRCM) | Update multipath NHG handling. |

### Scope

This document details the design and implementation of the "fpmsyncd extension" related to NextHop Group behavior in SONiC.
The goal of this "fpmsyncd extension" is to integrate NextHop Group (NHG) functionality into SONiC by writing NextHop Group entry from `fpmsyncd` to `APPL_DB` for NextHop Group operation in SONiC.

- Scope of this change is to extend `fpmsyncd` to handle `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` messages from FPM.
- There will be no change to SWSS/Orchagent.
- This change is backward compatible. Upgrade from a SONiC version that does not support this feature does not change the user's expected behavior as this feature is disabled by default.

### Overview
Expand Down Expand Up @@ -77,6 +77,9 @@ See [09072023 Routing WG Meeting minutes](https://lists.sonicfoundation.dev/g/so
- `fpmsyncd` to SET/DEL routes to `APPL_DB: ROUTE_TABLE` using `nexthop_group`
- `fpmsyncd` to SET/DEL NextHop Group entry to `APPL_DB: NEXTHOP_GROUP_TABLE`

`Orchagent extension` requires:
- `orchagent` to handle member updates from `APPL_DB:NEXTHOP_GROUP_TABLE` for multipath NextHop Group

This feature must be disabled by default.
- When this feature is disabled, behavior will be the same as before introducing this feature.
- i.e. `NEXTHOP_GROUP_TABLE` entry will not be created and `nexthop_group` will not be used in `ROUTE_TABLE` entry in `APPL_DB`.
Expand All @@ -99,6 +102,8 @@ For multipath route, the `RTM_NEWROUTE` is sent with a list of gateway and inter

This `Fpmsyncd extension` will modify `fpmsyncd` to handle `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` as below.

This featured introduces new orchagent `NhgOrch` to handle `RTM_NEWNEXTHOP` events and update `NEXTHOP_GROUP_TABLE` entries accordingly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "NhgOrch" already exists. The HLD/PR enhances the NhgOrch to handle recursive nexthop groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


<!-- omit in toc -->
##### Figure: Fpmsyncd NHG High Level Architecture
![fig: fpmsyncd nhg architecture](images_fpmsyncd/fpmsyncd-nhg-architecture.png)
Expand Down Expand Up @@ -230,14 +235,23 @@ admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.4
<tr><td>NHA_GROUP</td><td>[{125,1},{126,1}]</td></tr>
<tr><td rowspan="2">4</td><td rowspan="2">RTM_NEWROUTE</td><td>RTA_DST</td><td>10.1.1.4</td></tr>
<tr><td>RTA_NH_ID</td><td>ID127</td></tr>
<tr><td rowspan="2">5</td><td rowspan="2">RTM_NEWROUTE</td><td>RTA_DST</td><td>10.1.1.5</td></tr>
<tr><td>RTA_NH_ID</td><td>ID127</td></tr>
<tr><td rowspan="3">6</td><td rowspan="3">RTM_NEWNEXTHOP</td><td>NHA_ID</td><td>ID128</td></tr>
<tr><td>NHA_GATEWAY</td><td>10.0.0.4</td></tr>
<tr><td>NHA_OIF</td><td>22</td></tr>
<tr><td rowspan="2">7</td><td rowspan="2">RTM_NEWNEXTHOP</td><td>NHA_ID</td><td>ID127</td></tr>
<tr><td>NHA_GROUP</td><td>[{125,1},{128,1}]</td></tr>
</table>

A short description of `fpmsyncd` logic flow:

- When receiving `RTM_NEWNEXTHOP` events on sequence 1, 2 and 3, `fpmsyncd` will save the information in an internal list to be used when necessary.
- When `fpmsyncd` receive the `RTM_NEWROUTE` on sequence 4, the process will write the NextHop Group with ID 118 to the `NEXTHOP_GROUP_TABLE` using the information of gateway and interface from the NextHop Group events with IDs 116 and 117.
- Then `fpmsyncd` will create a new route entry to `ROUTE_TABLE` with a `nexthop_group` field with value `ID118`.
- When `fpmsyncd` receives the last `RTM_NEWROUTE` on sequence 5, the process will create a new route entry (but no NextHop Group entry) in `ROUTE_TABLE` with `nexthop_group` field with value `ID118`. (Note: This NextHop Group entry was created when the `fpmsyncd` received the event sequence 4.)
- When receiving `RTM_NEWNEXTHOP` events on sequence 1 and 2, `fpmsyncd` will write the information in `NEXTHOP_GROUP_TABLE` with NextHop Group `ID125` and `ID126` and with information of gateway and interface respectively. For sequence 3 multi path NextHop Group `ID127` the information in `NEXTHOP_GROUP_TABLE` will contain the list of IDs 125 and 126
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes are expected towards kernel programming? please capture those details if not captured in a separate document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new changes are required from frr/kernel side for this feature. When NHG is enabled in FRR, it already programs routes to kernel using reference to the NHID. This implementation in fact make the FPM programming similar to how the routes are being programmed in kernel already

- When `fpmsyncd` receive the `RTM_NEWROUTE` on sequence 4, the process will create a new route entry to `ROUTE_TABLE` with a `nexthop_group` field with value `ID127`.
- When `fpmsyncd` receives the next `RTM_NEWROUTE` on sequence 5, the process will create a new route entry (but no NextHop Group entry) in `ROUTE_TABLE` with `nexthop_group` field with value `ID127`. (Note: This NextHop Group entry was created when the `fpmsyncd` received the event sequence 4.)
- When receiving `RTM_NEWNEXTHOP` events on sequence 6 `fpmsyncd` will write the information in `NEXTHOP_GROUP_TABLE` with NextHop Group `ID128`
- When receiving `RTM_NEWNEXTHOP` events on sequence 7 `fpmsyncd` will update the information in `NEXTHOP_GROUP_TABLE` with NextHop Group `ID127` and the information will now contain list of IDs 125 and 128


#### Example of entries in ASIC_DB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture ASIC_DB update for NHG group with multipath

Copy link
Contributor Author

@nkelapur nkelapur May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added diagram to show the updated ASIC_DB entry

Expand All @@ -248,6 +262,32 @@ Therefore, even after this enhancement, table entries will be created for `ROUTE
##### Figure: Example of ASIC_DB entry
![fig3](images_fpmsyncd/fig3.svg)

#### Orchestration Agent Changes to handle NEXTHOP_GROUP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture the case NH as Interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"NH as Interface" will be same as any other NHG. The application ( zebra) will create a NHID for the NHG with NH as interface and the route will use this NHID as reference to the NH

A new orchestration agent `NhgOrch` will be written to handle the new NEXTHOP_GROUP_TABLE in APP_DB. For adding or updating an entry in the NEXTHOP_GROUP_TABLE, programming will depend on whether the group is configured with one or multiple next hops or is a recursive nexthop group i.e. it contains one or more nexthop group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment. NhgOrch is extended to handle recursive NEXTHOP_GROUP_TABLE entries in app-db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


- NhgOrch treats an NEXTHOP_GROUP as "recursive" if it has "nexthop_group" as the only field. This field contains "member" NEXTHOP_GROUP. The parent NEXTHOP_GROUP key is constructed by combining keys of the member NEXTHOP_GROUPs. Such an NEXTHOP_GROUP always uses a "NEXT_HOP_GOUP" object-id to represent the group
- Singleton (non-recursive) NEXTHOP_GROUP contain various fields e.g. Inexthop, alias etc. Such NEXTHOP_GROUP form a key by combining these fields and query the NeighOrch with the "key" to get the corresponding SAI ID. So singleton (non-recursive) NEXTHOP_GROUP do not own the SAI ID rather they just reuse the SAI IDs owned by NeighOrch module.
- If one or more members of a "recursive" NEXTHOP_GROUP are not available, then the "NEXTHOP_GROUP" object is created with just the available members as long as there is one. The msg is retained in NhgOrch's mToSync queue so that it keeps checking for the availability of all the member NEXTHOP_GROUP. As and when the member NEXTHOP_GROUP are available, the "NEXTHOP_GROUP" object is updated with the new members. The "recursive" NEXTHOP_GROUP allow for members to be updated without having to update the referring routes. Since the SAI ID remains unchanged, update is transparent to the RouteOrch modules.
- When a port-down event is received from Syncd, the corresponding nexthops are removed from all the nexthop groups they're part of.

Dummy code for this logic:
```
if (nexthop_group NOT present)
sai_id = sai ID of the next hop
else
create next hop group
add next hop group member for each next hop
sai_id = sai ID of the next hop group
```

The handling of whether a next hop group member can be programmed or not, interfaces going up and down, etc. will match the same handling done by next hop groups managed by the route orchagent.

When the next hop group is deleted, the next hop group orchagent will remove the next hop group from the ASIC_DB if required, and remove the association between the group identifier from APP_DB and the SAI identifier from ASIC_DB. This will only happen once the next hop group is no longer referenced by any routes - and so the next hop group will maintain a reference count within orchagent to ensure this.

The route orchagent will parse the new nexthop_group field, and call into the next hop group orchagent to find the corresponding SAI object ID to program as the next hop attribute. If the next hop group does not exist, the route will remain on the list of routes to sync, and will be retried on future runs of the route orchagent.

If a next hop group cannot be programmed because the data plane limit has been reached, one next hop will be picked to be temporarily used for that group. When a route is then programmed using that next hop group, it will be informed that a temporary form of the next hop group is in use, and so the route will remain on the pending list and will retry until the correct next hop group is programmed. This mirrors the current behaviour in the route orchagent when the next hop group limit is reached. The APP_DB entries remain unchanged.

### SAI API

Expand Down Expand Up @@ -475,6 +515,46 @@ Sample of CONFIG DB snippet given below:
}
},
```
#### APPL DB Enhancements

This feature adds a new NEXT_HOP_GROUP_TABLE, to store next hop group information to be used by one or more routes. Recursive/ecmp routes are represented by referencing a list of nexthop group ids in the nexthop_group field. When this field is used, other fields will not be present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"adds a new .." => "extends the existing .."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



```
### NEXT_HOP_GROUP_TABLE
Stores a list of groups of one or more next hops
Status: Mandatory
key = NEXT_HOP_GROUP_TABLE:string ;
arbitrary string identifying the next hop
group, as determined by the programming
application.
nexthop = *prefix, ; IP addresses separated
“,” (empty indicates no gateway)
ifname = *PORT_TABLE.key, ; zero or more separated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just PORT_TABLE? how about portchannel, vlan and sub-interfaces..etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change in this DB Table. This is as is from https://github.com/sonic-net/SONiC/blob/master/doc/ip/next_hop_group_hld.md

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow NH interfaces other than PORT_TABLE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I understand all possible existing NH interface would be allowed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning PORT_TABLE alone the schema is weird, please fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "INTF_TABLE" so that all types on interfaces are covered

by “,” (zero indicates no interface)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above fields already exist as part of the NEXT_HOP_GROUP_TABLE:

### NEXT_HOP_GROUP_TABLE

Please mark "nexthop_group" field as newly added by this HLD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is new in this HLD. Added "existing" and "new" fields labels for the "ROUTE_TABLE"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

nexthop_group = NEXT_HOP_GROUP_TABLE:key, ; index within the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example for the recursive routes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DB entry for recursive NHG is same as ECMP NHGs. In that for recursive NHG may contain only one NH and hence the NHA_GROUP will contain only one NHID

NEXT_HOP_GROUP_TABLE seperated by "," used
for recursice/ecmp routes. ( When this field
is present, other fields will not be present)

```
The ROUTE_TABLE is then extended to allow a reference to the next hop group to be specified, instead of the current nexthop and intf fields.
```
### ROUTE_TABLE
;Stores a list of routes
;Status: Mandatory
key = ROUTE_TABLE:prefix
nexthop = *prefix, ;IP addresses separated “,” (empty
indicates no gateway)
ifname = *PORT_TABLE.key, ; zero or more separated
by “,” (zero indicates no interface)
blackhole = BIT ; Set to 1 if this route is a blackhole
(or null0)
nexthop_group = NEXT_HOP_GROUP_TABLE:key ; index within the
NEXT_HOP_GROUP_TABLE, optionally used instead
of nexthop and intf fields
```


### Warmboot and Fastboot Design Impact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nexthop group ID reconciliation as part of the warm-boot is not currently supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Expand Down