Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] use go 1.22 #4262

Closed

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Apr 29, 2024

An alternative to #4193.

As @cyphar has indicated that we can rewrite the tid field of struct pthread through CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID, but first of all, we should find out the tid field's address of the struct pthread. Unfortunatelly, struct pthread wasn't export to pthread.h in glibc, so we should prctl(PR_GET_TID_ADDRESS) or pthread_t to get or scan the tid field's address.

To scan it from pthread_t, @cyphar wants to write out the struct pthread accordding glibc's code, because the code of this struct has no changes for about twenty years. But I think it is hard to review, and we should change runc code once struct pthread changed. I propose to use another way to find out it by scanning the memory of two threads, because:

  1. If we create a new thread 'A' in main thread, the content of struct pthread of 'A' has about 5 fields value changed according to the content of struct pthread of main; If we scan them and compare them to the actual tid of thread 'A' and main thread, the tid address could be easily to find out;
  2. After the thread 'A' has exited, the tid value of struct pthread will be set to -1, so we can verify the tid address from step 1 is correct or not.

@lifubang lifubang force-pushed the feat-go-1.22-tid-offset branch 6 times, most recently from b19fa35 to 3b2e786 Compare April 29, 2024 14:52
@kolyshkin
Copy link
Contributor

@lifubang can you please create a separate PR for "use go mod instead of go get in spec.bats"? We can merge it right now, less stuff to review later.

@lifubang
Copy link
Member Author

can you please create a separate PR for "use go mod instead of go get in spec.bats"?

#4264

@lifubang lifubang force-pushed the feat-go-1.22-tid-offset branch from 02c487b to 4e2e957 Compare April 30, 2024 04:59
@AkihiroSuda
Copy link
Member

Needs rebase

lifubang added 3 commits May 4, 2024 07:57
Signed-off-by: lifubang <[email protected]>
Signed-off-by: lfbzhm <[email protected]>
This reverts commit ac31da6.

Signed-off-by: lifubang <[email protected]>
Signed-off-by: lfbzhm <[email protected]>
This reverts commit e377e16.

Signed-off-by: lifubang <[email protected]>
Signed-off-by: lfbzhm <[email protected]>
@lifubang lifubang force-pushed the feat-go-1.22-tid-offset branch 3 times, most recently from 0705b2f to e139be2 Compare May 4, 2024 13:46
@lifubang lifubang marked this pull request as draft May 4, 2024 13:53
@lifubang lifubang force-pushed the feat-go-1.22-tid-offset branch from e139be2 to cb1b2dd Compare May 4, 2024 15:28
@lifubang
Copy link
Member Author

lifubang commented May 4, 2024

Every tests are passed except unit test in centos7, I have no idea about this, could someone help me to figure out the reason.
It seems that we have got the correct tid_addr, and the runc init process has went into golang routine, but the socket commution between runc run and runc init has blocked for some unknown reasons.

=== RUN   TestUsernsCheckpoint
panic: test timed out after 3m0s
running tests:
	TestUsernsCheckpoint (3m0s)
goroutine 40 [running]:
testing.(*M).startAlarm.func1()
	/usr/local/go/src/testing/testing.go:2259 +0x3b9
created by time.goFunc
	/usr/local/go/src/time/sleep.go:176 +0x2d
goroutine 1 [chan receive]:
testing.(*T).Run(0xc0001341a0, {0x8f899b?, 0x52e01c?}, 0x917968)
	/usr/local/go/src/testing/testing.go:1649 +0x3c8
testing.runTests.func1(0xe8c340?)
	/usr/local/go/src/testing/testing.go:2054 +0x3e
testing.tRunner(0xc0001341a0, 0xc0000c3bf8)
	/usr/local/go/src/testing/testing.go:1595 +0xff
testing.runTests(0xc00011a140?, {0xe72360, 0x41, 0x41}, {0xc0000c3d00?, 0x41de48?, 0xe8b9c0?})
	/usr/local/go/src/testing/testing.go:2052 +0x445
testing.(*M).Run(0xc00011a140)
	/usr/local/go/src/testing/testing.go:1925 +0x636
github.com/opencontainers/runc/libcontainer/integration.TestMain(0x4766da?)
	/home/runc/libcontainer/integration/init_test.go:21 +0x13
main.main()
	_testmain.go:177 +0x1c6
goroutine 18 [syscall]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0xc00021ef20?, 0x415e05?)
	/usr/local/go/src/syscall/syscall_linux.go:91 +0x30
golang.org/x/sys/unix.recvfrom(0x4b1980?, {0x0?, 0x0?, 0xc00018c310?}, 0xc000229bb8?, 0xc000196a20?, 0xc00021f000?)
	/home/runc/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:527 +0x6f
golang.org/x/sys/unix.Recvfrom(0x30?, {0x0, 0x0, 0x0}, 0x7f2a7868ba08?)
	/home/runc/vendor/golang.org/x/sys/unix/syscall_unix.go:332 +0x70
github.com/opencontainers/runc/libcontainer.(*syncSocket).ReadPacket(0xc000190120)
	/home/runc/libcontainer/sync_unix.go:45 +0x6d
github.com/opencontainers/runc/libcontainer.doReadSync(0xc000190120)
	/home/runc/libcontainer/sync.go:128 +0x85
github.com/opencontainers/runc/libcontainer.parseSync(0x992620?, 0xc00021f398)
	/home/runc/libcontainer/sync.go:191 +0x45
github.com/opencontainers/runc/libcontainer.(*initProcess).start(0xc00018c310)
	/home/runc/libcontainer/process_linux.go:701 +0x9d0
github.com/opencontainers/runc/libcontainer.(*Container).start(0xc00019e1e0, 0xc0001e2000)
	/home/runc/libcontainer/container_linux.go:344 +0x20e
github.com/opencontainers/runc/libcontainer.(*Container).Start(0xc00019e1e0, 0xc0001e2000)
	/home/runc/libcontainer/container_linux.go:214 +0xdb
github.com/opencontainers/runc/libcontainer.(*Container).Run(0x99bab8?, 0xc0001e2000)
	/home/runc/libcontainer/container_linux.go:227 +0x1d
github.com/opencontainers/runc/libcontainer/integration.testCheckpoint(0xc000134340, 0x1)
	/home/runc/libcontainer/integration/checkpoint_test.go:93 +0x485
github.com/opencontainers/runc/libcontainer/integration.TestUsernsCheckpoint(0xc000134340)
	/home/runc/libcontainer/integration/checkpoint_test.go:50 +0x145
testing.tRunner(0xc000134340, 0x917968)
	/usr/local/go/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 1
	/usr/local/go/src/testing/testing.go:1648 +0x3ad
goroutine 35 [IO wait]:
internal/poll.runtime_pollWait(0x7f2ac1bc1c90, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc000184540?, 0xc0000f20c7?, 0x1)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000184540, {0xc0000f20c7, 0xf39, 0xf39})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a
os.(*File).read(...)
	/usr/local/go/src/os/file_posix.go:29
os.(*File).Read(0xc00018e0a0, {0xc0000f20c7?, 0x413b45?, 0x10?})
	/usr/local/go/src/os/file.go:118 +0x52
bufio.(*Scanner).Scan(0xc00018a400)
	/usr/local/go/src/bufio/scan.go:214 +0x81b
github.com/opencontainers/runc/libcontainer/logs.ForwardLogs.func1()
	/home/runc/libcontainer/logs/logs.go:26 +0x6f
created by github.com/opencontainers/runc/libcontainer/logs.ForwardLogs in goroutine 18
	/home/runc/libcontainer/logs/logs.go:25 +0x1a9
goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7f2ac1bc1aa0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc000184840?, 0xc0000e8400?, 0x1)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000184840, {0xc0000e8400, 0x200, 0x200})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a
os.(*File).read(...)
	/usr/local/go/src/os/file_posix.go:29
os.(*File).Read(0xc00018e0f0, {0xc0000e8400?, 0x0?, 0xc0001e8e68?})
	/usr/local/go/src/os/file.go:118 +0x52
bytes.(*Buffer).ReadFrom(0xc000188d80, {0x9925e0, 0xc00018e0f0})
	/usr/local/go/src/bytes/buffer.go:211 +0x98
io.copyBuffer({0x9926c0, 0xc000188d80}, {0x9925e0, 0xc00018e0f0}, {0x0, 0x0, 0x0})
	/usr/local/go/src/io/io.go:416 +0x147
io.Copy(...)
	/usr/local/go/src/io/io.go:389
os/exec.(*Cmd).writerDescriptor.func1()
	/usr/local/go/src/os/exec/exec.go:560 +0x34
os/exec.(*Cmd).Start.func2(0x0?)
	/usr/local/go/src/os/exec/exec.go:717 +0x2c
created by os/exec.(*Cmd).Start in goroutine 18
	/usr/local/go/src/os/exec/exec.go:716 +0xa0a
FAIL	github.com/opencontainers/runc/libcontainer/integration	181.397s

cyphar and others added 2 commits May 5, 2024 10:50
Since glibc 2.25, the thread-local cache of the current TID is no longer
updated in the child when calling clone(2). This results in very
unfortunate behaviour when Go does pthread calls using pthread_self(),
which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread" is
strictly private and could change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with
the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as
long as runc is using glibc, when "runc init" is spawned the child
process will have a pointer directly to the cached value we want to
change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later,
we can simply use prctl(PR_GET_TID_ADDRESS).

For older kernels we need to memory scan the TLS structure
(pthread_self() is a pointer to the head of the TLS structure). However,
to avoid false positives we first try known-correct offsets based on the
current structure layouts. If that fails, we scan the 1K block for any
fields that might match. When doing the scan, we assume that the first
field we find that contains the actual TID of the current process is the
field we want.

Obviously this is all very horrific, and if you are reading this in the
future, it almost certainly has caused some horrific bug that I did not
forsee. Sorry about that. As far as I can tell, there is no other
workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID
behaviour of glibc in some way. We cannot "just" do a re-exec after
clone(2) for security reasons.

Sadly, this is all glibc-specific. musl doesn't even allow you to use
CLONE_CHILD_CLEARTID (and they use a different address for the TID
anyway). We could do the memory scan and manually overwrite the address
after clone(2), but we can deal with that in the future if it turns out
people use non-glibc builds and need this fix.

Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit b0654c7)
Signed-off-by: lifubang <[email protected]>
Signed-off-by: lfbzhm <[email protected]>
@lifubang lifubang force-pushed the feat-go-1.22-tid-offset branch from cb1b2dd to 950ff28 Compare May 5, 2024 10:50
@lifubang
Copy link
Member Author

There are some bugs in centos7, let's close this one.

@lifubang lifubang closed this May 12, 2024
@lifubang lifubang deleted the feat-go-1.22-tid-offset branch October 15, 2024 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants