-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--max-diff option works not as expected in solo mode #392
Comments
Suppresssing startup garbage is an optimization issue. I have chosen to ignore it rather than check it constantly and unnecessarilly. I'll look into the max-diff issue, it should be obvious in the code. There's other bad stuff going on, share counter, stats not availailable... Good timing, I have another release planned. |
There is no actual "full stop until net diff decreases to 0.45 or less". This is obvious just by CPU load. E.g. I've found #5732052 but should not. |
There's a bug in myr-gr for CPUs below AVX2. It doesn't submit shares properly. It's been that way for a long time. I'll take a look at the mechanics of max-diff. Edit: It looks like only one thread is pausing, you can test this with -t 1. The other threads are in the loop apparently unaware and need a kick. New work solo is handled by the miner threads, stratum uses the stratum_thread to handle new work. This migh explain why the problem appears to be only solo. The message isn't getting to the other threads. I need to dig deeper. |
The max-diff problem appears to also affect max-temp and others. Try adding: The myr-gr fix involves replacing a block of code in algo/groestl/myr-groestl.c/scanhash_myriad. It should look the same as algo/groestl/groestl.c/scanhash_groestl:
|
Try stratum+tcp://pool.cryptopowered.club:1304 using wallet GfWkqzKQfQDMQxjwi5iJCDbzhsCNxGLKHr and pass x, share TTF is ~30sec @ 15Mh |
Both problems have fixes. There was another problem with conditional mining that resulted in resuming after the initial 5 second pause without rechecking the condition (ie max-diff). It also affected max-temp and max-hashrate. I also added a complementary resume log. Next release. |
There is a secondary issue with max-diff with stratum because the stratum server will timeout after 5 minutes with no shares and stop sending new blocks. The miner won't see new blocks with lower diff and never resume resulting in deadlock. Adding --stratum-keeplalive option should prevent deadlock by resetting the stratum connection to start sending new blocks. GBT/getwork should not be affected. FYI |
Okay, but I personally see no reason to use max-diff with stratum. With stratum mining you are not associated with net diff, that's among the points of protocol itself. If the client doesn't like stratum diff, better to use another port (with lower diff or vardiff) or pool. I did a trick with my private build of gpu miner some years ago, when 2 instances mine together, one on pool and one solo. The point of that was like this: solo < maximum_acceptable_solo_diff_value <pool. It was all about pool fee economy. So, only one instance was actually mining at the moment. There is no min-diff option in cpuminer-opt though. |
I disagree, max-diff is based on net diff, stratum diff means nothing. |
net diff 500Ph => job diff 5Mh I don't know why should I take care of net diff... In solo mode TTF could change between seconds and years :D |
Stratum diiff doesn't change profitability. When stratum diff changes the share value changes to offset the change in share rate. |
Theoretically. Practically, no shares in a month is zero profitability (or no block in a moth, as you wish). Some guaranteed shares with low stratum diff in a month is non-zero profitability, if the pool is powerful enough to find blocks. If you multiply that by your 10-100-1000 workers or rigs, things should go even better. Not including luck factor, of course (some guys were lucky enough to find eth or btc block in solo and single gpu mining in our days). Some pool ops do rob people with 3-5-10% fee :) |
That's #379 & #389. I put that code in to detect potential segfaults before they occur. It's interesting it didn't crash. The interesting part is the message confirms that address is only aligned to 16 bytes, AVX2 requires 32, and would have
|
Yes With argon2d4096 can reproduce 100%
Affects only cpuminer-avx2-sha, cpuminer-avx2-sha-vaes and cpuminer-avx2 builds. Simple avx build is OK. Another interesting thing is that only avx2-sha build got yellow affinity colours :D |
Regarding the misalignment problem... It's too bad you can't compile, another missed opportunity. More testing is required. Until then I can only assume it would have crashed for you if compiled with gcc-11. Prior to gcc-11 the compiler didn't vectorize this loop. But that means there may be 2 compiler bugs:
Bug 1 seems to be present in both versions. The crash only occurs with gcc-11 due to more aggressive autovectoring and bug 2. |
The affinity issue was also previously reported, I assume it may be related to Windows CPU groups but I have nothing more than that at tis time. Builds for older CPUs don't support Windows CPU groups and use the old method. CPU groups was never properly tested. |
I can (and did it before in issues) @ Linux build-in env. Setting up gcc11 build env in Windows natively will be... omg. There's something like this https://sourceforge.net/projects/gcc-win64/files/12.2.0/ but it may lead to re-writing of cpuminer-opt... |
Statistically there is a 50% chance the address will be aligned in any build. |
I don't know anything about that sourceforge link but the MSys/MingW procedure is straightforward. About the affinity issue, try to see what triggers it and what CPU and core counts are involved. Also when the error occurs are all threads in error or just some? |
So is GCC v 12.2 (and it's libs) suitable? |
Yes, I use it. I think it's the default install now. You can also install gcc-11 and g++-11. Those 2 additional packages should get everything you need to compile with gcc-11. To use the non-default version you can set the following env vars before building...
I appreciate the effort. It makes more work for me too but I don't like mysteries. If you get msys up and running here's a suggested test plan: You'll have to watch for a randomly aligned pointer, ie no misaligned log. If a particular build has a naturally aligned address it is useless for testing. Try different AVX2 build until you find one with the misaligned pointer. If you can't find one, particularly with gcc-12, it may be a sign of a fix. A newer gcc-11 may also have a fix. With 3 builds to choose from, if none of them are misaligned there's a 1 in 2**3 chance it's not fixed and is just luck.
|
Some other notes about affinity. I have a Windows10 with 8 thread CPU with cpu groups enabled and haven't seen this error. Did you configure CPU groups differently or do you have a setup, like dual socket, Windows Enterprise, NUMA, that may have a different default CPU group configuration? This is going beyond my knowledge so I don't know how far I can take it. Edit: I'm looking for a debug log describing the number of cpu groups found but I don't see it in any of your posts. It requires -D. Edit2: It appears there have been changes made to CPU groups in Win11. Are you using Win11? |
My system is a single-CPU one, Win 10 Pro 21H1. Nothing special (yes, that CPU is a real rarity, but is basially 5700G with lower TDP, slightly lower multi-core performance and a bit better single-core performance - check here in case of any questions https://www.cpubenchmark.net/compare/4323vs4387/AMD-Ryzen-7-5700G-vs-AMD-Ryzen-7-5700GE). I see some changes here https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support (scroll down to "Behavior starting with Windows 10 Build 20348" section), but mine is 19043 |
The debug log I'm looking for is still missing. It should be displayed before the cpu capabiliilties. Edit: I did a quick test and I reproduced the log problem but not the affinity error. It seems that's a seperate problem. |
As for me, I do not usually bruteforce an optimal build, so it's just no way I use cpuminer-avx2-sha build if cpuminer-avx2-sha-vaes one works okay.
I've never ever seen "Found n cpus in cpu group n" even with "-D" I think you mean this: If it's not shown with such a CLI string, then something is not defined, ripped-off by compiler or log level. I've compiled from git in Linux subsystem - no such message: |
The missing log doesn't seem to be related to the affinity errors, it just provides more details about number of groups and group size. The error when setting affinity is the real problem, it hasn't been reported before. |
I think we may be losing focus on the issues discussed here, there are so many. I'll summarise.
3 & 4 is where I need help. I can't reproduce those errors. |
Possibly. I don't know. Link: I got an idea to stop the wallet, launch cpuminer-opt from debugger, then open wallet => "delayed" crash on getting work should happen. UPD.: Not so useful |
TLDR: last minute change jump to bottom. I don't know what code this is but it looks string related. RtlReportCriticalFailure is heap corruption. rtlIsAnyDebuggerPresent is the kernel debugger, and from it's address the function isn't very far from this code. It looks like kernel code checkimg string integrity The only string related changes were to a couple of logs, but struct work has strings as well as arrays. I assume that arrow is where it crashed. It's a software interrupt, int 3 is a breakpoint. That measn the previous "test esi,esi" failed (I assume non-zero) and fell through to the interrupt. rtlIsZeroMemory, in the previous function is interesting. This popped up from a search... https://stackoverflow.com/questions/24183982/error-from-ntdll-dll-because-of-a-malloc-c Reading through it I wonder if the heap manager was confused where the block actually started due to to the way mm_malloc works. If the heap manager assumed standard alignment when the bkock was actually aligned differently it would definitely cause a crash. Maybe useful after all. Everything seems to be pointing to mm_malloc now. Using 2 pointers, block pointer is returned by malloc, struct pointer is block pointer adjusted for alignment. Block pointer is used by free, struct pointer is used by the application. The heap manager should only see the block pointer, only the local function will be aware of the struct pointer. Edit: Edit: I wouldn't do this with mm_malloc, I'd go back to the original calloc and write a common aligner utility, as previously described, so they'd be in sync. Only mining code would know about the struct pointer while tq, heap etc would be blissfully ignorant. I assume all the list software is type agnostic only having a pointer and size. Edit: I'm feeling pretty good about this. The only concern is it's based on the assumption there is an issue between mm_malloc and Windows kernel. By managing the pointer myself I hope to workaround any such issue. The kernel will never see the aligned pointer and miner code will always used the aligned pointer. Edit: can you look this over? I want to make sure I don't make a silly logic error with align_ptr. miner.h workio_get_work pushes heap_ptr get_work pops heap_ptr Start here: Edit: If this doesn't work my only option is to disable loop auto vectorization in gbt_work_decode and revert to the original code. Edit: Actually the last option might be the best option. I can hand code the loop with the same number of SSSE3 instructions as the compiler does with AVX2. SSE2 takes a couple more. It's also functionally identical to memrev function that was imported for segwit. Hand coding should override auto-vectorization eliminating the need for 32 byte alignment. It looks like it might be a win-win. I need to sleep on it, don't want to make a decision while tired. |
I'm not sure but it may be a good idea to make a copy of cross build script with "-ggdb" option, and each time you publish .zip archive there'll be a copy of it (debug builds). So, any future release could be debugged easier and faster. New issue = detailed info from the beginning. Good thing is that it would be compiled with all the versions you use, not some random ones like GCC 12 or something. It is also being practiced by some coin devs with their wallets (if I mean devs, I mean devs, not a shipcoin copycats). As you use -j 8 now, compiling a second pack should be very fast. I could make a small documentation on installing and using ggdb in Windows then (through a pr here). Installing VS to perform such things is a real waste. |
The puzzle is coming together. Stratum does not crash because there is no similar loop in stratum. First, stratum calculates the target from nbits, gbt provides the target explicitly. Second stratum copies data directly to g_work without the intermetiate dynamically allocated, misaligned, temp work struct. Confidence is rising. Even though stratum can't test the optimized hardcoded memrev replacement, I can write a little test routine to do a dummy copy with test data and verify correct functionality. Regarding your ggdb suggestion. It might be a good idea, at least for the next release or until this issue is resolved. I am however, interested in WSL as another way to use cpuminer-opt on Windows. Any special procedures there? |
Well I just use the ones you can get with "wsl --list --online" from PowerShell. Affinity from there works OK. Another useful thing is that when you type "explorer.exe ." from your root there, you Linux filesystem gets opened inside Windows explorer. So there's no need to mount, use some network drives or http servers to share files. |
Test results Test Code (printf deleted for brevity): Results, tested with AVX2, AVX & SSE2, all good: mm128_bswap_128 is new, implemented differently for SSE2 and SSSE3.
Edit: I'm going to have to use this for memrev because it is also used with the heap and could be autovectorized. |
Do you have a public dropbox at mega? I could drop a new [debug] binaries package instead of publishing a release? Or maybe I can attatch it to an email, it a little big though. |
Just make a pass-protected zip or 7z archive and upload to https://www.file.io/ Small file, no need for privacy. |
I'm going to reintegrate the changes from v3.21.3, minus the prehash centralization. It was an experiment, a proof of concept that didn't work out as expected. It was intended to improve the scaling to large numbers of threads but actually had a negative effect in some cases. I don't think I'll build with -ggdb. It could change the build in a way that invalidates the testing. I'm confident now the problem is solved. |
https://file.io/3nFv5PwMc9dI I left the misaligned log available but it will be removed later. Misalignment of the data is normal. Edit: I'm rebuilding for publishing. I forgot to update the RELEASE_NOTES. I also removed the misalign log . Your testing is not affected. |
I see the crash is solved. I just need some time (~8-10 hours) to find some different algo blocks to confirm. That's about the build from that archive. I've put 2 coins w/ 4 algos in total. All of 'em with 2 threads and affinity. I won't be able to test avx512 builds - no CPU. It was not in v3.21.3. Miner TTF calculation is now wrong: |
Net_diff 0 was a stupid mistake, deleted too many lines. I only need gbt tested, arch doesn't matter. Edit: Are you using segwit? There are 2 byte reversal loops in segwit code that I also optimized and made alignment agnostic. You can retest the subissues if you want like groestl with arch < AVX2, affinity with arch >= AVX2, conditional mining any arch. |
Seems to be yes, it is being showed in log. I'm not sure about how it's configured in wallet or network(s) |
Any other issues with the logs that I might be able to address now? We already discussed net hashrate with multiple algos, Anything else? Anything from last stable release v3.21.2? Otherwise I'm satified and ready to release. |
I've set the CLI output as ">> log.txt" so will read tomorrow. It's almost night in my time zone. |
Have a good night and thank you. Edit: If segwit is enabled it means the wallet wants it. It also means the test provided the necessary code coverage to pass. Edit: did a little reading about WSL and it seems to confirm the NT kernel was at the root of the heap corruption crash. WSL2 (I assume you were using WSL2) has a real Linux kernel. The Windows Binaries package obviously uses the Windows kernel. This seems like a good choice for the preferred way to run cpuminer-opt on Windows. First I won't need to build binaries anymore, users can build for their specific architecture, it's Windows software and works mostly from a Windows perspective so it won't chase away linux-phobes. With native performance I can't think of any negatives. Edit: I'm mostly interested in the new block log for GBT, I rarely ever see one. Most of the others I can test with stratum. Diff targetting is different for GBT but it takes a very large sample to test the target threshold, and a very long time to obtain a suitable sample of blocks solo. There aren't any significant changes to any algos except those noted in the release. Luffa had a tweak but it would only affect AVX512 X16R family when Luffa is first in the hash order. I have previously tested it. |
Btw I see nothing interesting... Please not the time and zone are wrong |
The only thing I noticed are TTF 0 and solutions aren't reported as BLOCK SOLVED. I suspect they're both related to netdiff being zero. I don't expect net TTF to be correct for multi-algo coins because it's calculated using the current algo's diff and the total hashrate of all algos. If all found blocks were accounted in the wallet all is good. I've submitted v3.21.5, The netdiff issue should be fixed now. |
Two thing of interest in this extract from the log. The hash was rejected for unknown reasons. The hash was below target so if wasn't low diff. However, the reject was immediately followed by a new block so either you got it or someone else did just before you, in other words stale. The other thing is the debug log trying to report Xnonce2, there's no xnonce in GBT. This log is unnecessary. The nonce is output is in the blockheader data of another debug log. The xnonce2 value can also be included in th eother log for stratum. The thread was never of any real value but the log include the lane which was very useful during 4way development. [2023-03-15 11:01:42] 16 Submitted Diff 0.00059184, Block 3422330, Ntime 627b1164�[0m Edit: the logs also prove the misaligned crash is solved: AVX2, misaligned pointer, and no crash. |
Yes, stale. I got these before, checked some timestamps on different chains, were confirmed as stales. I've forgot about them.
Great! I'm on it. Need some time to confirm everything is okay. "Block solved" message is on it's place: |
Looks good, miner TTF looks reasonable, net TTF is zero due to the extremely high shad256d hashrate. A little hint about target and diff. GBT provides a precise target hash, the net_diff is calculated from it. |
It feels good to close this issue. It was a leaning experience. |
In your last screenshot I noticed that mining info log reported a different net diff than new block log, also a different block height. The provided target is used by GBT to test candidate hash for submision, new block netdiff is used to estimate TTF and grade a share as block solved. I you see any discrepencies feel free to open another issue. It's not complicated, I can switch to use mininginfo net diff instead. |
I Need the new block log for the same block as getmininginfo so I can compare the diff. Edit: Another possibility is that diif in getmininginfo is also for the previous block. That needs to be confirmed. I also got WSL working. It creates a VM with a network connection to the host. That's how it can bypass the Windows kernel. |
It is.
My common settings ("-t 4 --cpu-affinity 21760", "-t 4 --cpu-affinity 85" or "-t 8 --cpu-affinity 0x5555") do work well. Is your version WSL1 or WSL2? (wsl --status) |
I assume they match the calculated diff for the same block so there is no issue?
wsl2, Win10, ubuntu 20.04 (default). I tried to set distro to ubuntu-22.04 but it wouldn't accept the -D option. Edit: 0x5555 is overkill for -t 4, that's 8 bits set. It works because the excess bits are ignored but 0x55, binary 01010101, is correct. |
Yes, no issues here.
No, it's for -t 8, check above :) |
Thanks. It's nice to confirm the accuracy of converting hash to diff. |
Version is 3.21.1
cpuminer-avx2-sha-vaes.exe -t 8 --cpu-affinity 0x5555 -a algo -o http://127.0.0.1:54321 -u aaa -p bbb --max-diff=0.45
Seems this option makes sense, mining stops somehow on a high diff job, but then it continues.
I'm not sure if u got some solo env, gonna make detailed logs, if needed.
There's another micro issue nearby:
This demotivating block ttf on start makes people to close and forget rather than to wait for a correct stats :'( Better to slow it down a bit on start, but I'm not sure.
The text was updated successfully, but these errors were encountered: