-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpuminer-opt segfault when compiled with Clang #440
Comments
A very similar, if not identical segfault was seen on MacOS x86_64, a different CPU architecture and different OS. It was also observed the crash can occur on return from send_line, also suggesting stack corruption. It also suggests some randomness to the symptoms depending on what data has been corrupted. |
Not a lot of progress. |
Here's a trace of the segfault created by adding printf that didn't affect the crash. The messages are self explanatory and show the lower 32 bits of the sctx pointer set to all 0. Additionally iit show it changed betwen the end of stratum_send_line and returning from stratum_send_line with no code in between except the function return.
|
Updating Windows-11 from 23h2 to 24h2 made no difference. That was expected because the OS is not suspect.
|
With the release of v24.7 cpuminer-opt can now be compiled for Windows 11 on ARM CPUs with caveats.
A mysterious segfault can occur in stratum code. The code implicated by the segfault is in util.c:send_line.
The segfault was traced to corruption of the sctx pointer with the lower 32 bits zeroed. This suggests either stack or register corruption since sctx pointer was passed as a function pointer.
Neither Linux on ARM nor Windows on x86_64 have this problem.
Both Windows and MSys2/MingW are immature on ARM and may be the cause. The test system is a VM running on an Apple host which may also be the issue. Finally it may be an issue with cpuminer-opt that only affects ARM & Windows but that seems unlikely at this time.
While debugging by adding printf to capture specific data the segfault magically disappeared. This made it difficult to pinpont the exact location where the lower 32 bit of sctx pointer get zeroed. But it also provided a workaround that either prevented the corruption or move the corruption to some benign data.
This workaround has been made available in v27.4 and can be enabled by adding an option to CFLAGS during compilation.
While debugging I also tried moving the printf around and it only worked if inside the while loop before the curl call.
I also tried sleep(1) in place of printf, which worked in preventing the segfault. However usleep, even with 1,000,000 usecs did not prevent the segfault. This suggests it's not a timing issue or race condition. There's also no one to race with as all miner threads are wating for data from stratum.
It's should also be noted that the segfault occurs while holding the sock mutex and in general printf should be avoided in mutex regions. Ironically, in this case it serves to workaround a segfault. This could be a hint.
For the time being I'd like to identify which component is responsible for the segfault so pleasse report your experiences.
Building for Windows 11 on AArch64:
Follow the procedure for building for Linux on ARM with the following changes.
Msys2 does not yet support GCC on ARM so use CLANG. Install the clang mingw64 packages & use the clang shell environment.
The build procedure requires a combination of CFLAGS from the build-msys2.sh, arm-build.sh scripts, and possibly the flag to workaround the segfault.
There is no build script for Windows on ARM, enter commands manually.
To compile without the workaround run this configure command:
$ CFLAGS="-O3 -march=native -Wall -flax-vector-conversions -D_WIN32_WINNT=0x0601" ./configure --with-curl
If a segfault occurs make clean & recompile adding "-DARM_WIN_HACK" to CFLAGS:
$ CFLAGS="-O3 -march=native -Wall -flax-vector-conversions -D_WIN32_WINNT=0x0601 -DARM_WIN_HACK" ./configure --with-curl
Please report your results whether it works without the hack, only with the hack or not at all.
Also report any other issues that may arise.
Please include the CPU model, Windows 11 build & other pertinent details in reports.
Reports can be made preferably here, a post in bitcointalk forum or by email.
Other errata with Windows on ARM:
The text was updated successfully, but these errors were encountered: