Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Deadlock in Flush Function Due to ENOBUFS #286

Merged
merged 2 commits into from
Nov 29, 2024

Conversation

patryk4815
Copy link
Contributor

@patryk4815 patryk4815 commented Nov 25, 2024

Hi.
This PR resolves a issue in the Flush function, where a deadlock occurs when the kernel returns an ENOBUFS error.
This issue has been observed in our production 🙃 (cc: @Ignatella )

Changes:

  • Fix deadlock in Flush function
  • Added test to simulate the issue using a reduced read/write buffer, ensuring that the fix works correctly and prevents regressions

Debugger:

Switched from 56906 to 1 (thread 2855201)
(dlv) bt
0  0x000000000043de2e in runtime.gopark
    at runtime/proc.go:399
1  0x00000000004368b7 in runtime.netpollblock
    at runtime/netpoll.go:564
2  0x0000000000468425 in internal/poll.runtime_pollWait
    at runtime/netpoll.go:343
3  0x00000000004dbe67 in internal/poll.(*pollDesc).wait
    at internal/poll/fd_poll_runtime.go:84
4  0x00000000004e1fca in internal/poll.(*pollDesc).waitRead
    at internal/poll/fd_poll_runtime.go:89
5  0x00000000004e1fca in internal/poll.(*FD).RawRead
    at internal/poll/fd_unix.go:708
6  0x00000000004eb20a in os.(*rawConn).Read
    at os/rawconn.go:31
7  0x000000000079a56b in syscall.RawConn.Read-fm
    at <autogenerated>:1
8  0x0000000000798c49 in github.com/mdlayher/socket.rwT[go.shape.struct { github.com/mdlayher/socket.n int; github.com/mdlayher/socket.oobn int; github.com/mdlayher/socket.recvflags int; github.com/mdlayher/socket.from golang.org/x/sys/unix.Sockaddr }]
    at github.com/mdlayher/[email protected]/conn.go:795
9  0x00000000007984f2 in github.com/mdlayher/socket.readT[go.shape.struct { github.com/mdlayher/socket.n int; github.com/mdlayher/socket.oobn int; github.com/mdlayher/socket.recvflags int; github.com/mdlayher/socket.from golang.org/x/sys/unix.Sockaddr }]
    at github.com/mdlayher/[email protected]/conn.go:666
10  0x0000000000791eb4 in github.com/mdlayher/socket.(*Conn).Recvmsg
    at github.com/mdlayher/[email protected]/conn.go:572
11  0x000000000079f3f6 in github.com/mdlayher/netlink.(*conn).Receive
    at github.com/mdlayher/[email protected]/conn_linux.go:130
12  0x000000000079d9c2 in github.com/mdlayher/netlink.(*Conn).receive
    at github.com/mdlayher/[email protected]/conn.go:279
13  0x000000000079d747 in github.com/mdlayher/netlink.(*Conn).lockedReceive
    at github.com/mdlayher/[email protected]/conn.go:238
14  0x000000000079d62d in github.com/mdlayher/netlink.(*Conn).Receive
    at github.com/mdlayher/[email protected]/conn.go:231
15  0x00000000007ac35e in github.com/google/nftables.receiveAckAware
    at github.com/google/[email protected]/conn.go:94
16  0x00000000007acec5 in github.com/google/nftables.(*Conn).Flush
.......
    at runtime/proc.go:267
24  0x000000000046dbc1 in runtime.goexit
    at runtime/asm_amd64.s:1650
 
(dlv) frame 16

(dlv) p errs
error(*errors.joinError) *{
	errs: []error len: 1, cap: 1, [
		...,
	],}

(dlv) p errs.errs
[]error len: 1, cap: 1, [
	*github.com/mdlayher/netlink.OpError {
		Op: "receive",
		Err: error(*os.SyscallError) ...,
		Message: "",
		Offset: 0,},
]

(dlv) p errs.errs[0]
error(*github.com/mdlayher/netlink.OpError) *{
	Op: "receive",
	Err: error(*os.SyscallError) *{
		Syscall: "recvmsg",
		Err: error(syscall.Errno) *(*error)(0xc000034090),},
	Message: "",
	Offset: 0,}

(dlv) p errs.errs[0].Err
error(*os.SyscallError) *{
	Syscall: "recvmsg",
	Err: error(syscall.Errno) ENOBUFS (105),}

(dlv) p errs.errs[0].Err.Err
error(syscall.Errno) ENOBUFS (105)

Copy link

google-cla bot commented Nov 25, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@stapelberg stapelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!

nftables_test.go Outdated Show resolved Hide resolved
@stapelberg stapelberg merged commit c96bb63 into google:main Nov 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants