Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry syscalls on EINTR in internal.Connect (fixes: #141) #152

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

MaxMaeder
Copy link
Contributor

Retry connect and select syscalls if we get EINTR in Internal.Connect().

@MaxMaeder MaxMaeder requested a review from sergiu128 as a code owner December 13, 2024 05:03
Copy link
Collaborator

@sergiu128 sergiu128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you very much! Just needs to handle some edge cases.. and my mistake of not correctly testing for syscall.Errnos :)

if err := syscall.Connect(fd, ToSockaddr(remoteAddr)); err != nil {
// this can happen if the socket is nonblocking, so we fix it with a select
// https://man7.org/linux/man-pages/man2/connect.2.html#EINPROGRESS
for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely to happen, but there's an extreme case here where we always get EINTR. In this scenario we don't want to hot loop endlessly, so I think we should take note of the timeout here by making it a for condition. We'll then only loop for at most timeout.

@@ -130,40 +130,67 @@ func connect(fd int, remoteAddr net.Addr, timeout time.Duration, opts ...sonicop
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another improvement is to return an error if the timeout is zero.

}

// Retry the connect syscall if interrupted
if err == syscall.EINTR {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another improvement is to test for Errnos correctly, according to the spec https://pkg.go.dev/syscall#Errno. This means using errors.Is pretty much everywhere in this function.


for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also not hot-loop endlessly here if select gets EINTR. Because this syscall takes a timeout already, what we can do here is retry on EINTR for at most say 5 * timeout.

@MaxMaeder
Copy link
Contributor Author

MaxMaeder commented Dec 13, 2024

Should be fixed! I now test for errno correctly.

For the timeouts, I added a specific case in the connect() loop so we can return a sonicerrors.ErrTimeout:

// Prevent an infinite connect() loop, time out eventually
if time.Since(startTime) > 5 * timeout {
	return sonicerrors.ErrTimeout
}

For the select() loop, I just updated the timeout we pass to the syscall.

Linux will update t with the time not slept after each call to select() (man page), so we don't need an separate check like the other loop. This works because we already check if an individual select() call times out.

This was not a very readable solution, and had dubious compatibility with BSD etc. I've replaced the select() loop logic with:

for {
	// Prevent an infinite select() loop, time out eventually
	remainingTime := 5 * timeout - time.Since(startTime)
	if remainingTime < 0 { 
		return sonicerrors.ErrTimeout
	}

	t := unix.NsecToTimeval(remainingTime.Nanoseconds())

	n, err := unix.Select(fd+1, nil, &fds, nil, &t)

	...

@MaxMaeder MaxMaeder requested a review from sergiu128 December 13, 2024 19:05
Copy link
Collaborator

@sergiu128 sergiu128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent solution! Thanks a lot, changes are good! Sorry for the late review, had a lot going on at work the past week.

@sergiu128 sergiu128 merged commit e0c8035 into talostrading:master Dec 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants