-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/syscall/windows/registry: registry subkey enumeration cannot handle thread migrations #49320
Comments
The unsafe version of the function can't be removed, but it can be deprecated in favor of a safer replacement if we can define one. |
Change https://golang.org/cl/361154 mentions this issue: |
Change https://golang.org/cl/362576 mentions this issue: |
CL 361154 updated the standard syscall package to document that successive calls to syscall.RegEnumKeyEx must occur on the same OS thread (after that requirement was empirically discovered in golang/go#49320). This use in x/sys needs to be updated to comply with the newly-discovered requirement. Fixes golang/go#49466. Change-Id: Idc45d91f175e1db25c215213fcaa45982c2f5e6e Reviewed-on: https://go-review.googlesource.com/c/sys/+/362576 Trust: Bryan C. Mills <[email protected]> Run-TryBot: Bryan C. Mills <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
@mknyszek I tried to reproduce this issue on my Windows 10 laptop and windows-amd64-2016 builder and I cannot reproduce it. I build https://go-review.googlesource.com/c/sys/+/363574 and my test fails with
If my change follows your explanation of the problem, the test should fail on line 44 instead of 46. Where did I make mistake? Thank you. Alex |
Pinging @mknyszek . Please reply to #49320 (comment) Thank you. Alex |
1 similar comment
Pinging @mknyszek . Please reply to #49320 (comment) Thank you. Alex |
@alexbrainman The restriction may be limited to windows-386-2008. Perhaps that's a bug that was fixed in later editions. I didn't try to reproduce on later Windows, or windows/amd64. |
My test #49320 (comment) was run on windows-386-2008 and windows-amd64-2016. So still I don't see any "restriction" even on windows-386-2008 builder. What did you do differently from CL 363574 to reproduce this issue? Thank you. Alex |
Then I don't know. Perhaps that assumption is wrong. What I did was I inserted a call to It's possible this is a bug in the syscall path instead, maybe. Feel free to update the documentation if you think what I wrote is wrong. |
@alexbrainman, FWIW this failure mode also reproduced in |
Yes, it is quite possible the problem is in runtime, and not in Windows. You assumed that the problem is in OS, and that is a possibility. But we need some repro (maybe C code that fails in the same way) or at the very least repro in Go, before we should submit code based on that assumption. Your "fix" most likely just hiding a bug in runtime.
CL 362576 assumes that there is a bug in Windows. And we cannot reproduce this bug. So we don't really know why CL 362576 fixes #49466. Like I said above, CL 362576 could be masking bug in runtime. Alex |
I agree with you that an OS bug is unlikely, but I don't believe it to be a bug. I never did. I believe (as I wrote in CL 362576) that I thought this was an undocumented requirement. My reasoning was the following: in C land (or really any other place), it's difficult to imagine that this syscall would be called in a loop across threads. Go is unique here where a thread switch can happen at almost any time. I think such a scenario is far likelier. Clearly my original assertion was wrong in some way, however, and I admit that.
As I said before, I believe we can. Back when debugging this, I could have it fail nearly every time by adding a call to CL 362576 doesn't fix the issue as currently titled, but it does mitigate a real bug. It could be a bug in the runtime, I don't disagree. But the bug is mitigated. Given your attempts at reproduction, we should revert the changes to the documentation. But I don't think we should revert the mitigation. On Monday I will confirm that I can reproduce it. |
With this patch:
I've confirmed that |
Thank you @mknyszek I can reproduce this problem pretty much anywhere. Unfortunately I still doo not know what the problem is. The test fails because RegEnumKeyEx after a few calls returns ERROR_ACCESS_DENIED. RegEnumKeyEx is called in a loop starting with index 0, then 1, then 2 and so on. It always succeeds when called with index of 0. I can see it fails with either index 1 or 2. I also added a call to print GetCurrentThreadId value. And thread id changes every loop iteration. I also checked that RegEnumKeyEx output parameters don't move in memory between the calls (I printed buf and l addresses). I also noticed that https://docs.microsoft.com/en-us/windows/win32/sysinfo/enumerating-registry-subkeys uses KEY_READ instead of KEY_ENUMERATE_SUB_KEYS. So I also replaced registry.ENUMERATE_SUB_KEYS with registry.READ. And that fixes the problem. But https://docs.microsoft.com/en-us/windows/win32/api/winreg/nf-winreg-regenumkeyexw says to use KEY_ENUMERATE_SUB_KEYS, so I don't think my change is appropriate. @zx2c4 perhaps you have some suggestions here. Perhaps you can step into RegEnumKeyEx to see why it returns ERROR_ACCESS_DENIED in our test. Thank you. Alex |
Haven't done anything dynamic yet, but just statically reverse engineering things, it looks like:
The more interesting question is why If that theory holds true, moving to I agree this per-thread cache behavior is weird and surprising, and if it can't be fixed in Windows, it should probably at least be documented, so CCing @jstarks. |
Thanks for doing this. Your explanation sounds reasonable.
Yes. I will leave this issue opened until we hear from @jstarks. Alex |
Yes, this appears to be intentional behavior specifically for There's no obvious way to improve this in the OS without introducing a new API, something that is quite unlikely to occur since there's a reasonable-enough workaround here. I'll try to get the behavior documented. |
Thank you @jstarks for explaining.
We are talking about
Again this is implementation details. I don't see why you need a new API. Unless you mean that new API is required to build similarly fast implementation. Or something else.
Are you referring to the code that keeps https://go-review.googlesource.com/c/go/+/361154/ ? Then we had no other choice. And the requirement to stay on the same thread is not documented anywhere. So @mknyszek had a good guess. I did not believe it is possible for such Windows bug.
That would be nice. Thank you. Closing this issue, because I am satisfied this a Windows bug. Nothing we can do here. Alex |
In trying to land #44167, I discovered that an additional well-placed GC cycle in the
internal/syscall/windows/registry
tests caused one of the tests to mysteriously fail in aRegEnumKeyExW
syscall with "access denied." This syscall is called multiple times in a row, and the first call never was what failed.The total amount of code involved in the test is fairly small, so it was hard to see what could possibly be going wrong. Furthermore, the
RegEnumKeyExW
documentation doesn't indicate with certainty under what conditions it could fail (https://docs.microsoft.com/en-us/windows/win32/api/winreg/nf-winreg-regenumkeyexw#remarks). After looking at some execution traces, I stumbled upon the issue: if the goroutine executing these syscalls migrates between OS threads (not interrupting the syscall mind you, just in between consecutive calls), then subsequent calls will fail.I confirmed this by adding
runtime.LockOSThread
to theReadSubKeyNames
method onKey
, and that seems to fix it.Steps forward here I think are:
runtime.LockOSThread
inReadSubKeyNames
.RegEnumKeyExW
requires the goroutine to be locked to its OS thread.Another potential third step could be to expose a safer abstraction for this in the syscall package itself, but since the API is frozen, I don't think this is possible.
The text was updated successfully, but these errors were encountered: