Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--max-diff option works not as expected in solo mode #392

Closed
YetAnotherRussian opened this issue Mar 5, 2023 · 112 comments
Closed

--max-diff option works not as expected in solo mode #392

YetAnotherRussian opened this issue Mar 5, 2023 · 112 comments

Comments

@YetAnotherRussian
Copy link

Version is 3.21.1
cpuminer-avx2-sha-vaes.exe -t 8 --cpu-affinity 0x5555 -a algo -o http://127.0.0.1:54321 -u aaa -p bbb --max-diff=0.45

image

Seems this option makes sense, mining stops somehow on a high diff job, but then it continues.

I'm not sure if u got some solo env, gonna make detailed logs, if needed.

There's another micro issue nearby:

image

This demotivating block ttf on start makes people to close and forget rather than to wait for a correct stats :'( Better to slow it down a bit on start, but I'm not sure.

@JayDDee
Copy link
Owner

JayDDee commented Mar 5, 2023

Suppresssing startup garbage is an optimization issue. I have chosen to ignore it rather than check it constantly and unnecessarilly.

I'll look into the max-diff issue, it should be obvious in the code.
Edit: I don't see a problem, netdiff for block 5732040 is 0.67, max-diff= 0.45, so it should pause.

There's other bad stuff going on, share counter, stats not availailable...

Good timing, I have another release planned.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 5, 2023

Edit: I don't see a problem, netdiff for block 5732040 is 0.67, max-diff= 0.45, so it should pause.

There is no actual "full stop until net diff decreases to 0.45 or less". This is obvious just by CPU load. E.g. I've found #5732052 but should not.

@JayDDee
Copy link
Owner

JayDDee commented Mar 5, 2023

There's a bug in myr-gr for CPUs below AVX2. It doesn't submit shares properly. It's been that way for a long time.
That's why the stats are messed. It also explains the obsolete "submitted by thread" log.

I'll take a look at the mechanics of max-diff.

Edit: It looks like only one thread is pausing, you can test this with -t 1. The other threads are in the loop apparently unaware and need a kick. New work solo is handled by the miner threads, stratum uses the stratum_thread to handle new work. This migh explain why the problem appears to be only solo. The message isn't getting to the other threads. I need to dig deeper.

@JayDDee
Copy link
Owner

JayDDee commented Mar 5, 2023

The max-diff problem appears to also affect max-temp and others. Try adding:
if ( !state ) restart_threads();
at the end of cpu-miner.c:wanna_mine. That will kick the other miner threads. I can do some basic testing with stratum but I can't test solo.

The myr-gr fix involves replacing a block of code in algo/groestl/myr-groestl.c/scanhash_myriad. It should look the same as algo/groestl/groestl.c/scanhash_groestl:

if (hash[7] <= Htarg )
if ( fulltest(hash, ptarget) && !opt_benchmark )
{
pdata[19] = nonce;
submit_solution( work, hash, mythr );
}
/* delete
if (hash[7] <= Htarg && fulltest(hash, ptarget))
{
pdata[19] = nonce;
*hashes_done = pdata[19] - first_nonce;
return 1;
}
*/
I can test myr-gr if I can find a share at the pool, but with an unoptimized build it could take a long time.
Edit: Testing myr-gr is futile in a pool, share TTF is 9 hours with i9-9940x 28 threads.

@YetAnotherRussian
Copy link
Author

Testing myr-gr is futile in a pool, share TTF is 9 hours with i9-9940x 28 threads.

Try stratum+tcp://pool.cryptopowered.club:1304 using wallet GfWkqzKQfQDMQxjwi5iJCDbzhsCNxGLKHr and pass x, share TTF is ~30sec @ 15Mh

@JayDDee
Copy link
Owner

JayDDee commented Mar 6, 2023

Both problems have fixes. There was another problem with conditional mining that resulted in resuming after the initial 5 second pause without rechecking the condition (ie max-diff). It also affected max-temp and max-hashrate. I also added a complementary resume log.

Next release.

@JayDDee
Copy link
Owner

JayDDee commented Mar 6, 2023

There is a secondary issue with max-diff with stratum because the stratum server will timeout after 5 minutes with no shares and stop sending new blocks. The miner won't see new blocks with lower diff and never resume resulting in deadlock.

Adding --stratum-keeplalive option should prevent deadlock by resetting the stratum connection to start sending new blocks.

GBT/getwork should not be affected.

FYI

@YetAnotherRussian
Copy link
Author

FYI

Okay, but I personally see no reason to use max-diff with stratum. With stratum mining you are not associated with net diff, that's among the points of protocol itself. If the client doesn't like stratum diff, better to use another port (with lower diff or vardiff) or pool.

I did a trick with my private build of gpu miner some years ago, when 2 instances mine together, one on pool and one solo. The point of that was like this: solo < maximum_acceptable_solo_diff_value <pool. It was all about pool fee economy. So, only one instance was actually mining at the moment.

There is no min-diff option in cpuminer-opt though.

@JayDDee
Copy link
Owner

JayDDee commented Mar 6, 2023

Okay, but I personally see no reason to use max-diff with stratum. With stratum mining you are not associated with net diff, that's among the points of protocol itself. If the client doesn't like stratum diff, better to use another port (with lower diff or vardiff) or pool.

I disagree, max-diff is based on net diff, stratum diff means nothing.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 7, 2023

I disagree, max-diff is based on net diff, stratum diff means nothing.

net diff 500Ph => job diff 5Mh
net diff 500Gh => job diff 5Mh
net diff 50Mh => job diff 5Mh

I don't know why should I take care of net diff... In solo mode TTF could change between seconds and years :D

@JayDDee
Copy link
Owner

JayDDee commented Mar 7, 2023

Stratum diiff doesn't change profitability. When stratum diff changes the share value changes to offset the change in share rate.
Net diff, block reward & exchange rate are the factors that determine profitability, whether mining solo or in a shared pool.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 7, 2023

Stratum diiff doesn't change profitability.

Theoretically. Practically, no shares in a month is zero profitability (or no block in a moth, as you wish). Some guaranteed shares with low stratum diff in a month is non-zero profitability, if the pool is powerful enough to find blocks. If you multiply that by your 10-100-1000 workers or rigs, things should go even better. Not including luck factor, of course (some guys were lucky enough to find eth or btc block in solo and single gpu mining in our days).

Some pool ops do rob people with 3-5-10% fee :)

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 9, 2023

Today I also got this (just won't copycat issues):

image

image

image

This is argon2d4096

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

That's #379 & #389. I put that code in to detect potential segfaults before they occur. It's interesting it didn't crash.
This means the loop following the. message was not optimized with AVX2 else it would have crashed
I assume this is on Windows with the prebuilt binaries using gcc-9. The 2 crash issues were using gcc-11 on Linux.

The interesting part is the message confirms that address is only aligned to 16 bytes, AVX2 requires 32, and would have
crashed with autovectoring. Also interesting is target is the first element of work struct and both the struct and target are
supposed to be aligned to 64 bytes, enough for AVX512. The big question is why isn't it?

struct work
{
uint32_t target[8] __attribute__ ((aligned (64)));
uint32_t data[48] __attribute__ ((aligned (64)));
...stuff deleted...
} __attribute__ ((aligned (64)));

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 9, 2023

I assume this is on Windows with the prebuilt binaries using gcc-9

Yes

With argon2d4096 can reproduce 100%

"target": "00000fffff000000000000000000000000000000000000000000000000000000",
   "curtime": 1678398092,
   "noncerange": "00000000ffffffff",
   "sigoplimit": 1000000,
   "bits": "1e0fffff"
},
"id": 0
}
[2023-03-10 00:41:32] Misaligned work->target 000000000427EF10

Affects only cpuminer-avx2-sha, cpuminer-avx2-sha-vaes and cpuminer-avx2 builds. Simple avx build is OK.

Another interesting thing is that only avx2-sha build got yellow affinity colours :D

image

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

Regarding the misalignment problem...

It's too bad you can't compile, another missed opportunity. More testing is required.

Until then I can only assume it would have crashed for you if compiled with gcc-11. Prior to gcc-11 the compiler didn't vectorize this loop. But that means there may be 2 compiler bugs:

  1. alignment attribute not working, gcc-9 & 11,
  2. autovectoring without checking for adequate data alignment or ensurring adequate alignment, gcc-11

Bug 1 seems to be present in both versions. The crash only occurs with gcc-11 due to more aggressive autovectoring and bug 2.

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

The affinity issue was also previously reported, I assume it may be related to Windows CPU groups but I have nothing more than that at tis time. Builds for older CPUs don't support Windows CPU groups and use the old method. CPU groups was never properly tested.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 9, 2023

It's too bad you can't compile, another missed opportunity. More testing is required.

I can (and did it before in issues) @ Linux build-in env. Setting up gcc11 build env in Windows natively will be... omg. There's something like this https://sourceforge.net/projects/gcc-win64/files/12.2.0/ but it may lead to re-writing of cpuminer-opt...

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

Affects only cpuminer-avx2-sha, cpuminer-avx2-sha-vaes and cpuminer-avx2 builds. Simple avx build is OK.

Statistically there is a 50% chance the address will be aligned in any build.

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

I don't know anything about that sourceforge link but the MSys/MingW procedure is straightforward.
https://github.com/JayDDee/cpuminer-opt/wiki/Compiling-from-source
Scroll down to the easy way. Installing MSys takes some time but once setup it's a breeze.

About the affinity issue, try to see what triggers it and what CPU and core counts are involved. Also when the error occurs are all threads in error or just some?

@YetAnotherRussian
Copy link
Author

Installing MSys takes some time but once setup it's a breeze.

So is GCC v 12.2 (and it's libs) suitable?

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

So is GCC v 12.2 (and it's libs) suitable?

Yes, I use it. I think it's the default install now. You can also install gcc-11 and g++-11. Those 2 additional packages should get everything you need to compile with gcc-11. To use the non-default version you can set the following env vars before building...

$ export CC=gcc-11
$ export CXX=g++-11

I appreciate the effort. It makes more work for me too but I don't like mysteries.

If you get msys up and running here's a suggested test plan:

You'll have to watch for a randomly aligned pointer, ie no misaligned log. If a particular build has a naturally aligned address it is useless for testing. Try different AVX2 build until you find one with the misaligned pointer. If you can't find one, particularly with gcc-12, it may be a sign of a fix. A newer gcc-11 may also have a fix. With 3 builds to choose from, if none of them are misaligned there's a 1 in 2**3 chance it's not fixed and is just luck.

  1. test with gcc-11. I expect a crash if the pointer is misaligned.
  2. test with gcc-12. Maybe it's fixed already.
  3. test with gcc-9 as a control. Expect no crash with misaligned pointer as initially reported.

@JayDDee
Copy link
Owner

JayDDee commented Mar 9, 2023

Some other notes about affinity. I have a Windows10 with 8 thread CPU with cpu groups enabled and haven't seen this error. Did you configure CPU groups differently or do you have a setup, like dual socket, Windows Enterprise, NUMA, that may have a different default CPU group configuration? This is going beyond my knowledge so I don't know how far I can take it.

Edit: I'm looking for a debug log describing the number of cpu groups found but I don't see it in any of your posts. It requires -D.

Edit2: It appears there have been changes made to CPU groups in Win11. Are you using Win11?
https://bitsum.com/general/the-64-core-threshold-processor-groups-and-windows/

@YetAnotherRussian
Copy link
Author

My system is a single-CPU one, Win 10 Pro 21H1.

CPU and mb:
image
image
image
image

Nothing special (yes, that CPU is a real rarity, but is basially 5700G with lower TDP, slightly lower multi-core performance and a bit better single-core performance - check here in case of any questions https://www.cpubenchmark.net/compare/4323vs4387/AMD-Ryzen-7-5700G-vs-AMD-Ryzen-7-5700GE).

Groups in cpuminer-opt:
image

I see some changes here https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support (scroll down to "Behavior starting with Windows 10 Build 20348" section), but mine is 19043

@JayDDee
Copy link
Owner

JayDDee commented Mar 10, 2023

The debug log I'm looking for is still missing. It should be displayed before the cpu capabiliilties.
If you get "Binding thread n to cpu n in group n" you should have gotten "Found n cpus in cpu group n".
You get one but not the other. This should be something I can reproduce, I'll play with it a bit to see if I can.

Edit: I did a quick test and I reproduced the log problem but not the affinity error. It seems that's a seperate problem.
Do you have any insight when the error occurs and when it doesn't? Your CPU isn't unusually big so I don't see an obvious reason why the error hasn't been seen before.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 10, 2023

I don't see an obvious reason why the error hasn't been seen before

As for me, I do not usually bruteforce an optimal build, so it's just no way I use cpuminer-avx2-sha build if cpuminer-avx2-sha-vaes one works okay.

The debug log I'm looking for is still missing. It should be displayed before the cpu capabiliilties.
If you get "Binding thread n to cpu n in group n" you should have gotten "Found n cpus in cpu group n".
You get one but not the other. This should be something I can reproduce, I'll play with it a bit to see if I can.

I've never ever seen "Found n cpus in cpu group n" even with "-D"

I think you mean this:

image

If it's not shown with such a CLI string, then something is not defined, ripped-off by compiler or log level.

image

I've compiled from git in Linux subsystem - no such message:

image

@JayDDee
Copy link
Owner

JayDDee commented Mar 10, 2023

The missing log doesn't seem to be related to the affinity errors, it just provides more details about number of groups and group size. The error when setting affinity is the real problem, it hasn't been reported before.

@JayDDee
Copy link
Owner

JayDDee commented Mar 10, 2023

I think we may be losing focus on the issues discussed here, there are so many. I'll summarise.

  1. max-difff broken, fix is ready
  2. myr-gr broken stats with old build, fix is ready
  3. Misaligned work pointer using GBT, not reproduceable, need more data
  4. Affinity not working on Windows with CPU groups enabled, not reproduceable, need more data
  5. Missing debug log on Windows with CPU groups, reproduced, I will folllow up.

3 & 4 is where I need help. I can't reproduce those errors.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 10, 2023

3 is on a small pause, I cannot sync that wallet since today :( I got another one, but it should take up to a day to sync...

To reproduce 4, launch cpuminer-avx2-sha-vaes.exe with -t 8 --debug --cpu-affinity 0x5555 -a sha256d --benchmark and see:

image

Good.

Then, launch cpuminer-avx2-sha.exe with -t 8 --debug --cpu-affinity 0x5555 -a sha256d --benchmark and see:
image

Bad.

You won't need wallets, solo etc. for a bench. Are you able to reproduce on your Win version & env?

UPD: I'm currently syncing to get 3.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 13, 2023

I wonder is the mingw symbol table would help,

Possibly. I don't know.

Link:
https://mega.nz/file/iuQQBLIJ#GnJtxJeaq6HmZZ0ssvLwyTIGwDWRi_fwpcxNgv8cqRY
Pass: cpuminer

I got an idea to stop the wallet, launch cpuminer-opt from debugger, then open wallet => "delayed" crash on getting work should happen.

UPD.: Not so useful

image

image

@JayDDee
Copy link
Owner

JayDDee commented Mar 13, 2023

UPD.: Not so useful

TLDR: last minute change jump to bottom.

I don't know what code this is but it looks string related. RtlReportCriticalFailure is heap corruption. rtlIsAnyDebuggerPresent is the kernel debugger, and from it's address the function isn't very far from this code. It looks like kernel code checkimg string integrity

The only string related changes were to a couple of logs, but struct work has strings as well as arrays. I assume that arrow is where it crashed. It's a software interrupt, int 3 is a breakpoint. That measn the previous "test esi,esi" failed (I assume non-zero) and fell through to the interrupt.

rtlIsZeroMemory, in the previous function is interesting. This popped up from a search...

https://stackoverflow.com/questions/24183982/error-from-ntdll-dll-because-of-a-malloc-c

Reading through it I wonder if the heap manager was confused where the block actually started due to to the way mm_malloc works. If the heap manager assumed standard alignment when the bkock was actually aligned differently it would definitely cause a crash.

Maybe useful after all.

Everything seems to be pointing to mm_malloc now.

Using 2 pointers, block pointer is returned by malloc, struct pointer is block pointer adjusted for alignment. Block pointer is used by free, struct pointer is used by the application. The heap manager should only see the block pointer, only the local function will be aware of the struct pointer.

Edit:
The point of deviation seems to be which Windows kernel is being used, The binaries package will obviously use Windows own proprietary kernel, but emulators like CygWin usually have their own version of a Windows kernel. WSL I don't know.

Edit:
I have a slighly new theory. ret_work (a pointer) gets passed around using a list. It gets picked up by get_work as work_heap and copied to g_work. work_heap is the block pointer, not the adjusted struct pointer. if they are not equal the data will be corrupted and acrash is imminent. work_heap pointer could be similarly adjusted to calculate the struct pointer of a misaligned block.

I wouldn't do this with mm_malloc, I'd go back to the original calloc and write a common aligner utility, as previously described, so they'd be in sync. Only mining code would know about the struct pointer while tq, heap etc would be blissfully ignorant.

I assume all the list software is type agnostic only having a pointer and size.

Edit: I'm feeling pretty good about this. The only concern is it's based on the assumption there is an issue between mm_malloc and Windows kernel. By managing the pointer myself I hope to workaround any such issue. The kernel will never see the aligned pointer and miner code will always used the aligned pointer.

Edit: can you look this over? I want to make sure I don't make a silly logic error with align_ptr.
struct work type definition will also use WORK_ALIGNMENT instead of the current literal 64.
I think I'll also make the 2 aligned pointers named the same for clarity.

miner.h
#define WORK_ALIGNMENT 64
// block size must be padded by alignment number of bytes, don't use returned pointer to free.
static inline void *align_ptr( void *ptr, uint64_t alignment )
{
const uint64_t mask = alignment - 1;
return (void*)( ( ((uint64_t)ptr) + mask ) & (~mask) );
}

workio_get_work pushes heap_ptr
heap_ptr = calloc( 1, sizeof(struct work) + WORK_ALIGNMENT );
if ( !heap_ptr ) return false;
ret_work = (struct work*) align_ptr( heap_ptr, WORK_ALIGNMENT );

get_work pops heap_ptr
heap_ptr = (struct work*) tq_pop(thr->q, NULL);
if ( !heap_ptr ) return false;
// Align heap pointer
work_heap = (struct work*) align_ptr( heap_ptr, WORK_ALIGNMENT );
/* copy returned work into storage provided by caller */
memcpy( work, work_heap, sizeof(*work) );
free( heap_ptr );

Start here:

Edit: If this doesn't work my only option is to disable loop auto vectorization in gbt_work_decode and revert to the original code.

Edit: Actually the last option might be the best option. I can hand code the loop with the same number of SSSE3 instructions as the compiler does with AVX2. SSE2 takes a couple more. It's also functionally identical to memrev function that was imported for segwit. Hand coding should override auto-vectorization eliminating the need for 32 byte alignment. It looks like it might be a win-win. I need to sleep on it, don't want to make a decision while tired.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 14, 2023

I'm not sure but it may be a good idea to make a copy of cross build script with "-ggdb" option, and each time you publish .zip archive there'll be a copy of it (debug builds). So, any future release could be debugged easier and faster. New issue = detailed info from the beginning. Good thing is that it would be compiled with all the versions you use, not some random ones like GCC 12 or something. It is also being practiced by some coin devs with their wallets (if I mean devs, I mean devs, not a shipcoin copycats). As you use -j 8 now, compiling a second pack should be very fast.

I could make a small documentation on installing and using ggdb in Windows then (through a pr here). Installing VS to perform such things is a real waste.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

The puzzle is coming together. Stratum does not crash because there is no similar loop in stratum. First, stratum calculates the target from nbits, gbt provides the target explicitly. Second stratum copies data directly to g_work without the intermetiate dynamically allocated, misaligned, temp work struct.

Confidence is rising.

Even though stratum can't test the optimized hardcoded memrev replacement, I can write a little test routine to do a dummy copy with test data and verify correct functionality.

Regarding your ggdb suggestion. It might be a good idea, at least for the next release or until this issue is resolved.
For the longer term, real devs might be more inclined to compile themselves rather than use the prebuilt binaries.

I am however, interested in WSL as another way to use cpuminer-opt on Windows. Any special procedures there?

@YetAnotherRussian
Copy link
Author

Any special procedures there?

Well I just use the ones you can get with "wsl --list --online" from PowerShell. Affinity from there works OK. Another useful thing is that when you type "explorer.exe ." from your root there, you Linux filesystem gets opened inside Windows explorer. So there's no need to mount, use some network drives or http servers to share files.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

Test results

Test Code (printf deleted for brevity):
applog(LOG_INFO, "End start\n");
static const uint8_t in[32] = { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 };
static uint8_t out[32];
// printf in
*((__m128i*)out ) = mm128_bswap_128( *((__m128i*)in+1) );
*((__m128i*)out+1) = mm128_bswap_128( *((__m128i*)in ) );
// printf out1
for (int i = 0; i < 8; i++ )
((uint32_t*)out)[7 - i] = be32dec( (uint32_t*)in + i );
//printf out2
applog(LOG_INFO, "End test\n");

Results, tested with AVX2, AVX & SSE2, all good:
[2023-03-14 10:47:31] Start test
in : 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00
out1: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
out2: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
[2023-03-14 10:47:31] End test

mm128_bswap_128 is new, implemented differently for SSE2 and SSSE3.

#if defined(__SSSE3__)
#define mm128_bswap_128( v ) \
_mm_shuffle_epi8( v, m128_const_64( 0x0001020304050607, 0x08090a0b0c0d0e0f ) )
#else
#define mm128_bswap_128( v )
mm128_swap_64( mm128_bswap_64( v ) );
#endif

Edit: I'm going to have to use this for memrev because it is also used with the heap and could be autovectorized.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

Do you have a public dropbox at mega? I could drop a new [debug] binaries package instead of publishing a release?
I have google drive but don't use it, kept flagging cpuminer as malware.
Use email for privacy if desired [email protected].

Or maybe I can attatch it to an email, it a little big though.

@YetAnotherRussian
Copy link
Author

Just make a pass-protected zip or 7z archive and upload to https://www.file.io/

Small file, no need for privacy.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

I'm going to reintegrate the changes from v3.21.3, minus the prehash centralization. It was an experiment, a proof of concept that didn't work out as expected. It was intended to improve the scaling to large numbers of threads but actually had a negative effect in some cases.

I don't think I'll build with -ggdb. It could change the build in a way that invalidates the testing. I'm confident now the problem is solved.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

https://file.io/3nFv5PwMc9dI
pw = align
It's a duplicate of the file I intend to publish, only password protected.
Currently testing on 2 Linux and 1 Windows binaries.

I left the misaligned log available but it will be removed later. Misalignment of the data is normal.

Edit: I'm rebuilding for publishing. I forgot to update the RELEASE_NOTES. I also removed the misalign log . Your testing is not affected.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 14, 2023

I see the crash is solved. I just need some time (~8-10 hours) to find some different algo blocks to confirm. That's about the build from that archive. I've put 2 coins w/ 4 algos in total. All of 'em with 2 threads and affinity.

I won't be able to test avx512 builds - no CPU.

UPD: I now see "Net diff 0"
image

It was not in v3.21.3. Miner TTF calculation is now wrong:

image

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

Net_diff 0 was a stupid mistake, deleted too many lines. I only need gbt tested, arch doesn't matter.
Everything else is tested.
Net TTF needs net_diff, if it worked in v3.21.2 it will work with net_diff fixed.

Edit: Are you using segwit? There are 2 byte reversal loops in segwit code that I also optimized and made alignment agnostic.
That should cover everything in GBT that I can't test.

You can retest the subissues if you want like groestl with arch < AVX2, affinity with arch >= AVX2, conditional mining any arch.
You should nothing about CPU groups unless you compile from source with it.

@YetAnotherRussian
Copy link
Author

Are you using segwit?

Seems to be yes, it is being showed in log. I'm not sure about how it's configured in wallet or network(s)

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

Any other issues with the logs that I might be able to address now? We already discussed net hashrate with multiple algos,
same issue as nethashrate with multiplc coins in a pool, but more difficult to hide. My opinion is it's a coin problem if it's incorrect.

Anything else? Anything from last stable release v3.21.2?

Otherwise I'm satified and ready to release.

@YetAnotherRussian
Copy link
Author

Anything else?

I've set the CLI output as ">> log.txt" so will read tomorrow. It's almost night in my time zone.

@JayDDee
Copy link
Owner

JayDDee commented Mar 14, 2023

Have a good night and thank you.

Edit: If segwit is enabled it means the wallet wants it. It also means the test provided the necessary code coverage to pass.
With netdiff broken you can't test max-diff.

Edit: did a little reading about WSL and it seems to confirm the NT kernel was at the root of the heap corruption crash. WSL2 (I assume you were using WSL2) has a real Linux kernel. The Windows Binaries package obviously uses the Windows kernel.

This seems like a good choice for the preferred way to run cpuminer-opt on Windows. First I won't need to build binaries anymore, users can build for their specific architecture, it's Windows software and works mostly from a Windows perspective so it won't chase away linux-phobes. With native performance I can't think of any negatives.

Edit: I'm mostly interested in the new block log for GBT, I rarely ever see one. Most of the others I can test with stratum. Diff targetting is different for GBT but it takes a very large sample to test the target threshold, and a very long time to obtain a suitable sample of blocks solo. There aren't any significant changes to any algos except those noted in the release. Luffa had a tweak but it would only affect AVX512 X16R family when Luffa is first in the hash order. I have previously tested it.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 15, 2023

log.zip

Btw I see nothing interesting... Please not the time and zone are wrong

@JayDDee
Copy link
Owner

JayDDee commented Mar 15, 2023

The only thing I noticed are TTF 0 and solutions aren't reported as BLOCK SOLVED. I suspect they're both related to netdiff being zero. I don't expect net TTF to be correct for multi-algo coins because it's calculated using the current algo's diff and the total hashrate of all algos.

If all found blocks were accounted in the wallet all is good.

I've submitted v3.21.5, The netdiff issue should be fixed now.

@JayDDee
Copy link
Owner

JayDDee commented Mar 15, 2023

Two thing of interest in this extract from the log.

The hash was rejected for unknown reasons. The hash was below target so if wasn't low diff. However, the reject was immediately followed by a new block so either you got it or someone else did just before you, in other words stale.

The other thing is the debug log trying to report Xnonce2, there's no xnonce in GBT. This log is unnecessary. The nonce is output is in the blockheader data of another debug log. The xnonce2 value can also be included in th eother log for stratum. The thread was never of any real value but the log include the lane which was very useful during 4way development.
Removing that log will reduce the debug log noise level a bit.

[2023-03-15 11:01:42] 16 Submitted Diff 0.00059184, Block 3422330, Ntime 627b1164�[0m
[2023-03-15 11:01:42] Thread 4, Nonce 7c050080, Xnonce2 �[0m
[2023-03-15 11:01:42] Data[0:19]: 04049b00 d8abe317 9a25841b 424d39e3 fb150e25 60f791f6 9e4695e5 33946c3f a6a40557 aa577dd4�[0m
[2023-03-15 11:01:42] : 8d666413 f1ff1ad7 bafc0381 ee01a29c a9dac0f0 ddbf615b d8939f22 627b1164 feb00b1e 7c050080�[0m
[2023-03-15 11:01:42] Hash[7:0]: 00000699 a536f62f 83955dbe 56d4bd8b 02a64653 63231bf4 2b26ce59 12686867�[0m
[2023-03-15 11:01:42] Targ[7:0]: 00000bb0 fe000000 00000000 00000000 00000000 00000000 00000000 00000000�[0m
[2023-03-15 11:01:42] 16 �[0mA15 �[0mS0 �[01;31mRejected 1 �[0mB0�[0m, 58.394 sec (6ms)�[0m
�[33m Reject reason: inconclusive�[0m
[2023-03-15 11:01:42] Mining info: diff 0.00035081, net_hashrate 306496393718.552430, height 3422330�[0m
[2023-03-15 11:01:42] GBT new work received in 1.00 ms�[0m
[2023-03-15 11:01:42]�[36m New Block 3422331, Net Diff 0, Ntime 667b1164�[0m
Miner TTF @ 2646.04 h/s 0m00s, Net TTF @ 306.50 Gh/s 0m00s�[0m
[2023-03-15 11:01:42] Threads restarted for new work.�[0m

Edit: the logs also prove the misaligned crash is solved: AVX2, misaligned pointer, and no crash.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 15, 2023

in other words stale.

Yes, stale. I got these before, checked some timestamps on different chains, were confirmed as stales. I've forgot about them.

I've submitted v3.21.5, The netdiff issue should be fixed now.

Great! I'm on it. Need some time to confirm everything is okay. "Block solved" message is on it's place:

image

@JayDDee
Copy link
Owner

JayDDee commented Mar 15, 2023

Looks good, miner TTF looks reasonable, net TTF is zero due to the extremely high shad256d hashrate.

A little hint about target and diff. GBT provides a precise target hash, the net_diff is calculated from it.
Stratum provides the diff encoded in the nbits field in the blockheader, the net_diff and the target are calculated from that.

@JayDDee
Copy link
Owner

JayDDee commented Mar 16, 2023

It feels good to close this issue. It was a leaning experience.

@JayDDee JayDDee closed this as completed Mar 16, 2023
@JayDDee
Copy link
Owner

JayDDee commented Mar 16, 2023

In your last screenshot I noticed that mining info log reported a different net diff than new block log, also a different block height.
Besides the mining info data being stale I don't know if the diff changed or if they are reported different for the same block.
Look for corresponding new block log and mining info log for the same block to confirm the net diff.
If they are different what happens to the shares in between is interesting. It could take a long time to find such shares.
I predict if mining info reports lower diff those shares would be submitted but only reoported as accepted but would in fact solve a block. If the new block reports lower diff I expect no effect.

The provided target is used by GBT to test candidate hash for submision, new block netdiff is used to estimate TTF and grade a share as block solved.

I you see any discrepencies feel free to open another issue. It's not complicated, I can switch to use mininginfo net diff instead.
I may end up doing that anyway for GBT, but it would help to have some supporting data. It would also help to determine if my math is introducing the error.

@YetAnotherRussian
Copy link
Author

In your last screenshot I noticed that mining info log reported a different net diff than new block log, also a different block height.

image

"blocks" in wallet vs height (in cpuminer-opt)

So the new "job" is "blocks + 1":

image

Hope that helps.

@JayDDee
Copy link
Owner

JayDDee commented Mar 16, 2023

Hope that helps.

I Need the new block log for the same block as getmininginfo so I can compare the diff.

Edit: Another possibility is that diif in getmininginfo is also for the previous block. That needs to be confirmed.
If the getmininginfo diff matches the diff in the new block log with the same block height then everything is ok, if they don't match I need to follow up.

I also got WSL working. It creates a VM with a network connection to the host. That's how it can bypass the Windows kernel.
With full CPU compatibility and speed it seems like the preferred way to go. Only issue is the lack of CPU temp.
Edit: affinity doesn't seem to work on WSL. With -t 1 --cpu-affinity 1 the thread moves around to different cores. I won't be following up.

@YetAnotherRussian
Copy link
Author

YetAnotherRussian commented Mar 17, 2023

diif in getmininginfo is also for the previous block

It is.

affinity doesn't seem to work on WSL

My common settings ("-t 4 --cpu-affinity 21760", "-t 4 --cpu-affinity 85" or "-t 8 --cpu-affinity 0x5555") do work well.

Is your version WSL1 or WSL2? (wsl --status)

@JayDDee
Copy link
Owner

JayDDee commented Mar 17, 2023

diif in getmininginfo is also for the previous block

It is.

I assume they match the calculated diff for the same block so there is no issue?

affinity doesn't seem to work on WSL

My common settings ("-t 4 --cpu-affinity 21760", "-t 4 --cpu-affinity 85" or "-t 8 --cpu-affinity 0x5555") do work well.

Is your version WSL1 or WSL2? (wsl --status)

wsl2, Win10, ubuntu 20.04 (default). I tried to set distro to ubuntu-22.04 but it wouldn't accept the -D option.
I tested -t 1 --cpu affinity 1 and the load floated around to other cores. I assume it's the VM doing this.
I also assume the lack of affinity is causing my observed 10% performance drop over either mingw builds.
The affinity is set on guest Ubuntu kernel, not the host Windows kernel ulnless WSL explicitly passes it along to the host.
The CPU on the test system for Windows has a hot core so I typically use t 3 with 0xa2 to keep the hot core idle and cool.
This works fine with the binaries or MSys2 builds. I used coretemp to monitor physical core temps to determine load.

Edit: 0x5555 is overkill for -t 4, that's 8 bits set. It works because the excess bits are ignored but 0x55, binary 01010101, is correct.

@YetAnotherRussian
Copy link
Author

I assume they match so there is no issue?

Yes, no issues here.

0x5555 is overkill for -t 4

No, it's for -t 8, check above :)

@JayDDee
Copy link
Owner

JayDDee commented Mar 17, 2023

Thanks. It's nice to confirm the accuracy of converting hash to diff.
I was previously able to test diff_to_hash and hash_to_diff together by repeatedly converting back and forth to see if error would creep in. Now I have a point of reference proving hash_to_diff_is correct, so diff_to_hash is also correct. This is critical for stratum because diff_to_hash is used for share submission decisions. For GBT it only affects log reporting of solved blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@YetAnotherRussian @JayDDee and others