Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime, internal/poll: darwin: ensure that no thread is consumed, nor a syscall.Read if FD isn't yet ready for I/O #33953

Closed
odeke-em opened this issue Aug 29, 2019 · 11 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Milestone

Comments

@odeke-em
Copy link
Member

Coming here from #32326.
That test was meant to ensure that an (*os.File).Read which is backed by internal/poll would ONLY consume a thread for a read syscall if I/O was ready.

Examining @MichaelTJones' crash dump in reveals in deed that when it crashed

$ grep 'os_test.go' stack | sort | uniq
/Users/mtj/go/src/os/os_test.go:2283 
/Users/mtj/go/src/os/os_test.go:2283 +0x370 
/Users/mtj/go/src/os/os_test.go:2286 +0x84 
/Users/mtj/go/src/os/os_test.go:2286 +0x84 fp=0xc00025afa8 sp=0xc00025af48 
/Users/mtj/go/src/os/os_test.go:2297 +0x3aa 

non of the code had gotten past blocking on a syscall.Read as per

<-creading

and also from his execution traces
image

Somehow the poller is tripping out and not blocking until the file descriptor is ready for I/O.

In that test, our high water mark for the max number of threads was 50 but for regular machines where GOMAXPROCS=2,4,8, the maximum value of threads if every call was doing a syscall.Read would be 10, 18, 34 if we follow

        P := runtime.GOMAXPROCS(0)
        n := threads
        maxThreads := 0
        if n/P-1 >= P {
                maxThreads = 2 + 4*P
        } else {
                maxThreads = 1 + (7 * P / 4) + (n / P)
        }

but in his case, given GOMAXPROCS=32,36, it would be 61, 67.

Hence why his test failed but it went undetected for usually small values of GOMAXPROCS. Let's also examine the behaviors between versions of Go to see when things regressed.

@odeke-em odeke-em added this to the Go1.14 milestone Aug 29, 2019
@odeke-em odeke-em changed the title runtime, internal/poll: ensure that nothread is consumed if FD isn't yet ready for I/O runtime, internal/poll: ensure that no thread is consumed, nor a syscall.Read if FD isn't yet ready for I/O Aug 29, 2019
@julieqiu
Copy link
Member

/cc @aclements @randall77

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 29, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/192342 mentions this issue: os: skip TestPipeThreads on GOOS=darwin

@ianlancetaylor ianlancetaylor changed the title runtime, internal/poll: ensure that no thread is consumed, nor a syscall.Read if FD isn't yet ready for I/O runtime, internal/poll: darwin: ensure that no thread is consumed, nor a syscall.Read if FD isn't yet ready for I/O Aug 31, 2019
gopherbot pushed a commit that referenced this issue Aug 31, 2019
Updates #32326.
Updates #33953.

Change-Id: I97a1cbe682becfe9592e19294d4d94f5e5b16c21
Reviewed-on: https://go-review.googlesource.com/c/go/+/192342
Run-TryBot: Emmanuel Odeke <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/192757 mentions this issue: os: skip TestPipeThreads on GOOS=darwin

tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
Updates golang#32326.
Updates golang#33953.

Change-Id: I97a1cbe682becfe9592e19294d4d94f5e5b16c21
Reviewed-on: https://go-review.googlesource.com/c/go/+/192342
Run-TryBot: Emmanuel Odeke <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@odeke-em
Copy link
Member Author

odeke-em commented Sep 2, 2019

I found the commit that introduced this problem, commit a3b0144#diff-38e21adab3460688e3ce0b71fa9422d0 aka CL https://go-review.googlesource.com/c/141639 which is a part of Go1.12.

This program should be able to run alright with GOMAXPROCS=8 and a max of 8 threads before that commit

package main

import (
	"log"
	"os"
	"runtime/debug"
	"sync"
	"syscall"
	"time"
)

func main() {
	n := 100

	var rl, wl []*os.File
	for i := 0; i < n; i++ {
		prc, pwc, err := os.Pipe()
		if err != nil {
			// Cleanup.
			for j := 0; j < i; j++ {
				wl[i].Close()
				rl[i].Close()
			}
		}
		rl = append(rl, prc)
		wl = append(rl, pwc)
	}

	defer debug.SetMaxThreads(debug.SetMaxThreads(8))

	var wg sync.WaitGroup
	for i := 0; i < n; i++ {
		wg.Add(1)
		go func(i int) {
			bs := make([]byte, 4)
			wg.Done()
			prc := rl[i]
			if _, err := prc.Read(bs); err != nil {
				log.Fatalf("Failed to read %d: %v\n", i, err)
			}
		}(i)
	}

	wg.Wait()
	println("Waiting now")
	for {
		<-time.After(5 * time.Second)
		if true {
			return
		}
		proc, _ := os.FindProcess(os.Getpid())
		proc.Signal(syscall.SIGQUIT)
	}

	for i := 0; i < n; i++ {
		if _, err := wl[i].Write([]byte("Hello")); err != nil {
			log.Fatalf("Write #%d failed: %v", err)
		}
	}
}

Kindly pinging @randall77.
For Go1.13, we've disabled the test on Darwin but after we've fixed the issue, we'll need to backport to Go1.12 and Go1.13 as well.

@odeke-em odeke-em added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 2, 2019
gopherbot pushed a commit that referenced this issue Sep 3, 2019
Updates #32326.
Updates #33953.

Change-Id: I97a1cbe682becfe9592e19294d4d94f5e5b16c21
Reviewed-on: https://go-review.googlesource.com/c/go/+/192342
Run-TryBot: Emmanuel Odeke <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
(cherry picked from commit bac5b3f)
Reviewed-on: https://go-review.googlesource.com/c/go/+/192757
Run-TryBot: Andrew Bonventre <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
Updates golang#32326.
Updates golang#33953.

Change-Id: I97a1cbe682becfe9592e19294d4d94f5e5b16c21
Reviewed-on: https://go-review.googlesource.com/c/go/+/192342
Run-TryBot: Emmanuel Odeke <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@odeke-em odeke-em added the Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations) label Sep 14, 2019
@odeke-em
Copy link
Member Author

odeke-em commented Oct 6, 2019

Some great news, @minux's CL https://go-review.googlesource.com/c/go/+/197938 fixed this issue in that change from entersyscallblock() to entersyscall() so we'll need to backport Minux's CL to both Go1.12 and Go1.13 as per https://groups.google.com/d/msg/golang-dev/UYDTDWHJsC8/a1_4KiKnCwAJ.

@gopherbot please backport this issue to Go1.12 and Go1.13.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #34713 (for 1.12), #34714 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/199477 mentions this issue: os: re-enable TestPipeThreads on darwin

@networkimprov
Copy link

@odeke-em also need to revert 4c8037b

@odeke-em
Copy link
Member Author

odeke-em commented Oct 6, 2019 via email

@networkimprov
Copy link

I don't follow you... The commit I referenced dropped the test on the 1.13 branch. The test has been reinstated on tip.

@ianlancetaylor
Copy link
Member

We don't need to re-enable the test on the branch. That's not a bug fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Projects
None yet
Development

No branches or pull requests

5 participants