-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netlink reports an error=-33 on reading a netlink socket #353
Comments
https://github.com/thom311/libnl/blob/7bf2e64654de75578db34b0eab53a3e4d106d65f/include/netlink/errno.h#L52 |
I think in this case, the netlink flag would be set to NLM_F_DUMP_INTR which means the messages are incomplete |
is this related to the non-blocking socket issue? |
Not 100 percent sure, but it seems more of kernel internal processing. |
Update: Have not been able to read through the netlink_dump() and the callers of it throughly, one of my guess is that: non-blocking socket might increased the chance of this problem, since it won't wait in recvmsgs() for NLMSG_DONE message if no data available on socket immediately, but instead come out and wait on Select(). SWSS_LOG_DEBUG("netlink reports NLE_AGAIN on reading a netlink socket"); could be changed to NOTICE level for now so we may check whether NLE_DUMP_INTR always happens after NLE_AGAIN Update 2:
Update 3: |
Yes, maybe we can change this to NOTICE for now. |
Hi all, I've recently observed this one of our regression runs.
Did anyone encounter this issue after libnl version 3.5.0 upgrade. @pavel-shirshov I have a question on your comment in the PR #3967 "3.5.0 fixes (at least declared) following issue which bites us sometimes". Can you share the commit/patch which addressed this issue? I was not able to find any Release Notes for 3.5.0 version as such. |
syslog_with_netlink_debug_flag_enabled.log I've enabled NCDB flag to record the netlink messages and luckily the 'error=-33' log was also seen after a few trails After some analysis, these are my observations regarding the reason for this log:
RTM_NEWLINK (When create or set attributes on a link happens) I believe there is a coalescing of messages received between RTM_GETLINK (has a non-zero sequence number) and RTM_NEWLINK (has a 0 sequence number since it's a kernel notification).
It is not very clear to spot which of these pair of notification caused this inconsistency from the log attached because they all happen in < 1 second interval. However, I am not sure of how the prev_seq and seq attrs of netlink_callback are set and i might be wrong here in thinking that sequence numbers from different requests (RTM_GETLINK & RTM_NEWLINK) are somehow causing the NLM_F_DUMP_INTR flag to become 1 |
@vivekreddynv, do you know what is the implication for receiving such message? do we need to do anything or not? |
What implication does this have? Why does this happen? As a side note, There is an also an issue reported in libnl3 (not from SONiC) Link regarding this. So, currently, This log seems harmless and no functional impact is observed. For any additional info, further investigation has to be taken into. |
The text was updated successfully, but these errors were encountered: