-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BufferOrch] Use SAI bulk API to configure port, PG and queue #40
Conversation
7a16745
to
70b651e
Compare
70b651e
to
095028a
Compare
Signed-off-by: Stepan Blyschak <[email protected]>
095028a
to
08dd163
Compare
DEL happens first, then SET. Otherwise the order in bulk SET could be different. Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
08dd163
to
9139878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
In addition, a question regarding the design.
Is there any chance to leverage bulker
infra which has been used for all other bulk operations in the orchagent?
we do not have a "real" error-handling mechanism for buffer PG, queue, ingress and egress buffer profile list in orchagent. it just aborts itself when it gets an error from SAI API call. maybe we can "assume" the SAI API always returns successfully and just package the arguments to the bulker object and continue to handle the following steps
- add/remove counter
- add/remove reference number
and then handle bulk operation in doTask
In addition, there is no dependency in SAI between counter/reference number and buffer profile. Even if the SAI can fail, we can still handle counter and reference number as if it succeeded, and revert them once the customer reverts the configuration which fails SAI.
What do you think?
std::vector<PortContext> ports; | ||
}; | ||
|
||
struct PriorityGroupTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a template to eliminate redundant code. I see common logic is shared between ingress and egress side (queue vs PG, buffer ingress/egress profile list).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs Definitely worth discussion, I see common code in processPG/processQueue as well as ingress/egress port buffer profile list. Due to an already existing similar code separation I decided not to go with generic templates when adding *Bulk and *Post corresponding methods because I tried to preserve current flexibility of having separate methods for each object due to possible future specialization which in case of templates would add more complexity.
std::vector<sai_attribute_t> attrs; | ||
std::vector<sai_status_t> statuses; | ||
|
||
auto& bulk = m_queueBulk[op]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the items can be pushed into m_queueBulk
in any order but DEL
are handled first and then SET
.
Will there be an issue? Eg. the following sequence
- set Ethernet0:3-4
- del Ethernet0:3-4
the buffer item should be removed but created in bulk.
maybe this can be resolved in either of the following ways
- save the last time operation (
DEL
orSET
) and invoke SAI bulk set API once the current operation differs from the last time.
This should not impact performance since in most cases we have one operation in a period - Push
DEL
tom_queueBulk
only if there is aSET
operation for the same queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs I designed this based on the invariants guaranteed by m_toSync map - https://github.com/sonic-net/sonic-swss/blob/master/orchagent/orch.cpp#L108.
SET and then DEL will result in all keys removal in m_toSyncMap and insertion of DEL task - https://github.com/sonic-net/sonic-swss/blob/master/orchagent/orch.cpp#L103.
So, in that sequence, only the last DEL operation will be in m_toSyncMap resulting in SAI bulk set operation with buffer profile set to SAI_NULL_OBJECT_ID.
Consider DEL and then SET command for the same key (Possible scenario based on the comment in addToSync
). Both result in SAI set operations. If we combine that in single bulk there will be no guarantee SAI preserve the right order. If you consider what SAI driver may do it makes sense - for efficient configuration SAI will reorder set commands to group based on buffer profile and will make it out of order.
Therefore, for DEL and SET there are different bulk buffers and we flush DEL commands first.
Please note, SET and SET for different keys with overlapping queues (for example, Ethernet0:3-4 and Ethernet0:2-3) is considered invalid and the order in which these commands will be applied is undetermined with or without this change, since they can be ordered arbitrarily in m_toSyncMap which is a multimap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct. I forgot this logic.
overlapping queues can be treated as invalid configuration.
struct PortContext | ||
{ | ||
std::string port_name; | ||
std::vector<PgContext> pgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the benefit of organizing the queues and PGs hierarchically is to avoid storing port-level variables for each queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs Inherently yes, port level context is shared for multiple queues of that port.
So, if I understand correctly the task could look like this EthernetX,EthernetY,EthernetZ|queue1,queue2,queue3 and the question is why storing queues 1-3 for each port and not port per each queue?
I find that most of config is one port - multiple pgs/queues, so this data structure is optimal for that usual case.
The code that processes PGs and queues looked like this:
for p in port:
for q in queues:
...
and not the other way around, and so the data structure is derived from the code structure. So that the corresponding *Post method does the same two for-loops.
Regarding "Assuming" success is a compelling idea I had when I was originally presenting the design. It works great for fast and warm boot where any failure is hard crash. Due to the ask from MSFT to also handle cold boot and runtime config I was not comfortable assuming success and was trying to provide an infra to handle status code at the cost of preserving task context. Also |
OK, we can not handle the reference number if it retries. |
What I did
Make use of SAI set bulk API to improve switch boot up performance, especially in warm-boot and fast-boot scenarios.
The general concept:
process*
methods which add the SAI operation with a context to a bulk buffer. Bulk buffers are split by DB operation.process*Post
methods are invoked to handle SAI status code and perform post set operations like enabling FC counter for a PG/queue upon success.This design allows re-use of all existing code that is written to handle one task at a time and a small change is needed to maintain task context persistence throughout steps 1-3.
Why I did it
To improve fast-boot/warm-boot convergence time.
How I verified it
Manual verification by booting up the T0 topology configuration and running basic tests with static and dynamic buffer models. Fast-boot tested and observed ~2 times improvement in buffer configuration time on 256 port system from 3 sec to 1.7 sec when syncd falls back to single SET API. A bigger improvement is expected with proper SAI driver bulk SET support.
Details if related