[1.8.0] "No active pools" with dozen of them #194

electroape · 2018-10-17T15:02:40Z

I noticed smth like that with previous versions but with 1.8.0 this issue is on another level.

[2018-10-17 19:47:16] * POOL #1: $PROXY1$:3333 [2018-10-17 19:47:16] * POOL #2: $PROXY2$:3333 [2018-10-17 19:47:16] * POOL #3: $PROXY3$:3333 [2018-10-17 19:47:16] * POOL #4: de01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #5: fr01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #6: at01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #7: hk01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #8: de02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #9: fr02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #10: at02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #11: hk02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #12: pool.supportxmr.com:3333 [2018-10-17 19:47:16] * CC Server: $PROXY1$:3344 [2018-10-17 19:47:16] * COMMANDS: hashrate, pause, resume, quit [2018-10-17 19:47:16] Starting thread 1/3 affined to core: #0 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] Starting thread 3/3 affined to core: #1 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] Starting thread 2/3 affined to core: #2 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] use pool $PROXY1$:3333 [2018-10-17 19:47:16] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 19:47:19] accepted (1/0) diff 3000 (1 ms) [2018-10-17 19:47:38] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 19:47:39] accepted (2/0) diff 3000 (3 ms) [2018-10-17 19:47:53] SIGHUP received, exiting [2018-10-17 19:47:53] no active pools, stop mining [2018-10-17 19:47:53] [$PROXY1$:3333] Error: "[Read] ќпераци¤ ввода/вывода была прервана из-за завершени¤ потока команд или по запросу приложени¤"

That is, on proxy restart miner stops mining entirely and doesnt resume. Strangely enough i have a few machines that stop mining too (no pools, stop mining) but resume when main poxy goes online.

[2018-10-17 17:49:09] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 17:49:10] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 17:49:18] [$PROXY2$:3333] Error: "[Connect] �� , �.�. �� , �� -�� " [2018-10-17 17:49:19] [$PROXY1$:3333] Error: "[Read] End of file" [2018-10-17 17:49:19] no active pools, stop mining [2018-10-17 17:49:19] [$PROXY1$:3333] Error: "[Read] �� /�� -�� " [2018-10-17 17:49:20] use pool $PROXY1$:3333

The text was updated successfully, but these errors were encountered:

Bendr0id · 2018-10-17T15:07:09Z

Sorry but not a single change on the network code since 1.6.x

I restart my proxy several times an hour, without issues. With such a fuckedup log file I can't help you.

Bendr0id · 2018-10-17T15:09:04Z

Maybe you're using newer boost/GCC/updated visual studio or other libs then on your old build.

electroape · 2018-10-17T15:10:05Z

I don't know why log is pasted that way :) My locale is russian, that error is smth like "IO operation was cancelled"

electroape · 2018-10-17T15:13:22Z

As i said, i've noticed miners stucking like that on previous builds. My environment isn't changed other than miners & cc server version. But i've restarted proxy earlier (on earlier versions) and it wasn't that reproductible.

electroape · 2018-10-17T15:15:38Z

Falling back to older version for now and will try to reproduce it on english locale.

djfinch · 2018-10-17T15:17:51Z

It's Snippa's, MoneroOcean or some other fork? I saw this multiple times and it was always (in my case) proxy issue. I'm running multiple 1.8.0 miners atm and everything works well (MO proxy with some amendmens on rPi2/bionic/docker container).

electroape · 2018-10-17T15:21:08Z

It isn't proxy issue, miner have a pleny of backup pools, it just stops doing anything when encounters that error on main pool. But proxy is xmrig-proxy if that's somehow relevant ...

Bendr0id · 2018-10-17T15:27:53Z

Without a proper log I can't help you. But to be honest the only change in 1.8 is the algorithm. Nothing else. So often the problem exists it is in all versions since 1.6.

I would bet if I create a new build with just version incremened, there will be ppl saying the old one was better.

electroape · 2018-10-17T15:28:23Z

Okay, i've reversed miners to 1.7.0 they're stopping mining when main proxy goes offline with the same error message but without "no pools, stop mining" and don't switch to backup pools either, but when main pool goes online they resume mining, that's difference with 1.8.0.

electroape · 2018-10-17T15:30:47Z

I published log on pastebin > https://pastebin.com/tqEpdSkE

Bendr0id · 2018-10-17T15:31:13Z

When they lose connection to the proxy they have to stop. Because everything they mine is pure waste. When the pool connection is dropped, it will try again after some time.

How long you waited..?

Bendr0id · 2018-10-17T15:32:54Z

I don't see a problem on your log? It starts mining after a while

I still just see ?????? Btw in the log

electroape · 2018-10-17T15:34:01Z

Ofcourse they stop when they lose connection to pool, but they don't switch to any backup pool either. There was probably 5 min window between when i restarted proxy and noticed that miners don't do anything. And miners config is 5 retres with 1 sec timeout

electroape · 2018-10-17T15:35:05Z

Oh, forgot to mention, that's 1.7.0 behaviour, it resumes mining, 1.8.0 don't.

I'll get to home in a hour and try to reproduce it again with russian and english locales, thanks for attention.

djfinch · 2018-10-17T16:00:01Z

Sorry for pointing at proxy but log is literally unreadable... It might be something with boost but something similar was inside xmrig <2.5.3 where miner can't recover connection and also switch to failover pool and there is no boost used in original xmrig so...

electroape · 2018-10-17T17:53:23Z

Full log > https://pastebin.com/vN0EV0e0

To trigger the bug you need to configure atleast 3 pools; pool #1 is online, then offline, pool #2 is unresolveable (either nonexistent domain name, or unresponding IP or closed port), pool #3 is fallback pool.

When miner first connects to pool #1 and then that pool goes offline - miner gets stuck at pool #2 and doesn't go further, suspending mining indifenitely (even when it configured with 1 retries and 1 second retry-pause). Note that if pool #1 is offline at the startup - miner succesfully connects to fallback pool #3 skipping unresolveable pool #2, but when pool #1 goes online and then offline it still gets stuck at #2.

Hope that helps.

djfinch · 2018-10-17T20:13:43Z

So... I'm able to partially replicate this...
Testing env:

Pool 1 : xmr-node-proxy (XNP)
Pool 2 : example.com (dummy)
Pool 3 : supportxmr
Result: NOT GOOD. XNP is killed --> miner stuck in loop trying to reach Pool 2 which does not exist and ignore Pool 3. Anyway, running proxy again will resurrect the miner.

Pool 1 : xmr-node-proxy
Pool 2 : supportxmr
Pool 3 : example.com (dummy)
Result: GOOD. XNP killed --> miner switch to Pool2. XNP is started --> miner will connect back to proxy. That's expected behavior.

2nd test should work for you, too...
I saw in your log that proxy pool was first and supportxmr 2nd. You successfully got multiple jobs from supportxmr so it works! However, I think your 25H/s miner is not able to hash 5000diff which is min-diff there.

And how is possible that restarting proxy does not help and force miner to connect back? I don't know. Maybe it's stuck and occupying port but I cannot reproduce this. Anyway, in case that Pool 1 is available again, miner will switch. Everytime. Even if 2nd pool is dummy. Proofs are above... So, 1st example (dummy pool is 2nd) is definitively an issue. 2nd example (2nd pool is working and dummy is 3rd) should work everywhere and your issue can be caused by env, boost, compiler, proxy, occupied port, weather, I don't know.

electroape · 2018-10-17T21:05:39Z

Thanks for tests. Your 2nd test will not have this bug, i didn't test that case either.

As for resuming mining after main pool goes offline and online again, i actually forgot to test that, was too focued on issue that i was seeing even on older versions (tested above). Will test that case now.

Bendr0id · 2018-10-17T21:11:06Z

I'll record tomorrow some videos. But what I tested so far, 1.7 is behaving exactly the same way 1.8 does.

So this has to be a coincidence

electroape · 2018-10-17T21:30:52Z

Silly me. The reason why i created this ticket is that after proxy restart a major part of my miners is gone from dashboard so i assumed they stopped mining, but that actually means that either they're crashed or cannot connect \ refused to connect by CC server. I'll try to reproduce this now.

So there's actually 2 separate bugs. One is indefinite loop on unresolveable pool, miner just stops mining, this bug was here for atleast two releases prior 1.8.0 i just wasn't bothered enough to investigate it until now. Second one is still under question.

electroape · 2018-10-17T21:41:54Z

Okay, i upgraded to 1.8.0 again and restarted proxy and CC server several times, it's not reproducting. So there's only one bug for now - miner can't fallback to pools down the list after unresolveable one.

I'll record tomorrow some videos. But what I tested so far, 1.7 is behaving exactly the same way 1.8 does.

Exacly, this bug was here in 1.7.0 too :) Tell us if you still will not be able to reproduce it.

Bendr0id · 2018-10-17T21:45:29Z

I have same results like @djfinch.

You're the one who said it was working on 1.7, I always said that 1.7 does the same like 1.8 and that's still valid.

electroape · 2018-10-17T21:53:58Z

I didn't quite get it, you don't consider @djfinch test #1 behaviour a bug ?

As of second bug (disappearing of miners from dashboard of unknown reason) i will watch closely and i'll report if i will stumble upon it again, i've had disabled per miner logs so i can't see what was going on client side, i've enabled them now and it will be easier to see what's going on.

Bendr0id · 2018-10-17T22:00:39Z

I don't know what's misleading..

"I have same results like @djfinch."

In other words "I see the issue he was able to reproduce too". I was able to reproduce that.

But this is still a corner case. But I will look into it.

Why the hell arent you fixing your proxies? Maybe you have DNS issues? I restart my proxies 50 times a day when hopping coins. And I don't have these issues...

electroape · 2018-10-17T22:31:25Z

Okay, got it.

That pools config is kinda legacy workaround, my gateway was acting weird and i added alternative ip aswell, i don't remember exacly what happened.

Nevermind, i'll remove these unresolveable pools for now as workaround and report if there'll be problems with restarting proxy/CC.

PS: 50 times a day? Holy cow... I'm not that dedicated :) Btw, how's progress with proxy integration?

Bendr0id · 2018-10-18T10:04:34Z

Can you please test this branch

https://github.com/Bendr0id/xmrigCC/tree/proper_handling_of_dns_issues

The described cases from @djfinch work now. DNS issues are now handles like normal connection errors, and jumps to the next pool. And keeps retrying, once the main pool is back again it jumps back to it.

Bendr0id · 2018-10-18T19:17:06Z

@uz-spark tested?

electroape · 2018-10-20T14:21:03Z

Uh, sorry, not fixed.

https://pastebin.com/JKFq435K

Bendr0id · 2018-10-20T14:27:38Z

Why? It looks perfect, after 5 attemps it always tries to conncet to the next one and keeps trying the others until it is able to connect to one, then it stops retrying. And at the end it is connected to one of your fallbacks.

[2018-10-20 19:11:57] [inexistant_domain_name:6666] Error: "[Connect] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
[2018-10-20 19:11:57] use pool de01.supportxmr.com:3333
[2018-10-20 19:11:57] new job from de01.supportxmr.com:3333 with diff 5000 and PoW 2
[2018-10-20 19:12:00] [$MY_PROXY$:6666] timeout

But it will always try to connect to your primary server. Once it is connected to it, it will jump again to it. All that is the expected behavior.

[2018-10-20 19:13:40] new job from $MY_PROXY$:6666 with diff 3000 and PoW 2

Cant see an issue here.

electroape · 2018-10-20T14:33:38Z

Why? It looks perfect, after 5 attemps it always tries to conncet to the next one and keeps trying the others until it is able to connect to one, then it stops retrying. And at the end it is connected to one of your fallbacks. But it will always try to connect to your primary server. All that is the expected behavior.

[2018-10-20 19:13:40] new job from $MY_PROXY$:6666 with diff 3000 and PoW 2

Cant see an issue here.

I started proxy back to see if it will connect to it. It's main proxy, first in the pool list.

I've retested it with clearer settings. Hashrate is low but difficulty is 100 and despite there's some hashrate even after 'no pools, stop mining', you can see that it doesn't submit anything to failover proxy.

https://pastebin.com/Zy0sLgnT

Bendr0id · 2018-10-20T14:39:53Z

I tested a lot cases and it was always recovering, at least when the main proxy is back. In you log file, there was phase where no fallback was responsive. I dont know if thats a real usecase.

electroape · 2018-10-20T14:43:07Z

I tested a lot cases and it was always recovering, at least when the main proxy is back. In you log file, there was phase where no fallback was responsive. I dont know if thats a real usecase.

In that case failover pool is my proxy on the same machine as main proxy, just on other port, i don't know why it was unresponsive for a second, maybe host was too busy (it's on VM). But i don't see how that's not real usecase, i just replaced regular pool for my proxy to lower difficulty so i can see hashrate cleaner.

electroape · 2018-10-20T14:47:41Z

I mean, i think that's pretty normal usecase :

Pool #1 - your main proxy
Pool #2 - your failover proxy on another host, if you decide to maitenance on host with main proxy you will online your failover proxy, otherwise it not needed and offline
Pool #3 - regular pool, just in case both your proxies is offline

electroape · 2018-10-20T14:49:48Z

I'll do a test with config in default settings as far as possible to see if that's issue, i don't know why you can't reproduce that.

Bendr0id · 2018-10-20T15:02:36Z

https://pastebin.com/ZiNuptfc

Case 1:
First i turned off my main proxy, jumping to second proxy after 5 attemps. Turning on first proxy, jumps back to main proxy.

Case 2:
First i turned off my main proxy, jumping to second proxy after 5 attemps. Turning off second proxy, miner jumps to fallback. Turn of main proxy, jumps back to main proxy.

Bendr0id · 2018-10-20T15:06:40Z

https://pastebin.com/zMatn4yH

Case 3:
Main proxy down, 2nd proxy down, jumping to fallback. Turning on first proxy, jumps back to main proxy

Bendr0id · 2018-10-20T15:08:10Z

Btw, same counts when 2nd proxy has bad dns. Just tested it.

electroape · 2018-10-20T15:19:01Z

Here, i retested with almost default config and both proxies and miner on the same machine.

https://pastebin.com/0Gxsv4S9

In yourtest you don't reproduce my steps. I'll describe it again in detail :

Pools :

your main pool - it must be online and go offline at some point
your failover pool - it's offline because it's failover (huh) and you don't need it for now, atleast it's my usecase
any other failover pool

Steps :

If pool Fixed windows build #1 is online on miner start - miner succesfully connects to pool Integrate xmrig-nvidia and similar into dashboard/miner control #3 skipping through unresponsible pool Static linking of uvlib #2.
Pool Fixed windows build #1 goes offline, miner loops trying to connect to pools Fixed windows build #1 and Static linking of uvlib #2 and doesn't check pool Integrate xmrig-nvidia and similar into dashboard/miner control #3 at all

What you're missing in your test is that pool #2 must be offline when pool #1 goes offline otherwise even if then pool #2 goes offline too - miner succesfully connects to pool #3 because pool #1 already unresponsive and only one thing it can do is to try to connect to other pools down the list, pool #3 in that case.

electroape · 2018-10-23T07:25:35Z

So, was you able to confirm this or not ?

Bendr0id mentioned this issue Oct 19, 2018

Proper handling of DNS issues #197

Merged

Bendr0id closed this as completed Oct 20, 2018

djfinch mentioned this issue Oct 20, 2018

Xmrigcc-1.8.1 will not reconnnect to proxy sever/pool after connection lost. #198

Closed

[1.8.0] "No active pools" with dozen of them #194

[1.8.0] "No active pools" with dozen of them #194

Comments

electroape commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018

electroape commented Oct 17, 2018

electroape commented Oct 17, 2018

djfinch commented Oct 17, 2018 • edited Loading

electroape commented Oct 17, 2018 • edited Loading

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018

electroape commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018

electroape commented Oct 17, 2018

djfinch commented Oct 17, 2018

electroape commented Oct 17, 2018

djfinch commented Oct 17, 2018 • edited Loading

electroape commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018 • edited Loading

electroape commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018

Bendr0id commented Oct 17, 2018

electroape commented Oct 17, 2018

Bendr0id commented Oct 18, 2018

Bendr0id commented Oct 18, 2018

electroape commented Oct 20, 2018

Bendr0id commented Oct 20, 2018 • edited Loading

electroape commented Oct 20, 2018

Bendr0id commented Oct 20, 2018

electroape commented Oct 20, 2018

electroape commented Oct 20, 2018

electroape commented Oct 20, 2018

Bendr0id commented Oct 20, 2018

Bendr0id commented Oct 20, 2018

Bendr0id commented Oct 20, 2018

electroape commented Oct 20, 2018

electroape commented Oct 23, 2018

djfinch commented Oct 17, 2018 •

edited

Loading

electroape commented Oct 17, 2018 •

edited

Loading

djfinch commented Oct 17, 2018 •

edited

Loading

electroape commented Oct 17, 2018 •

edited

Loading

Bendr0id commented Oct 20, 2018 •

edited

Loading