Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.8.0] "No active pools" with dozen of them #194

Closed
electroape opened this issue Oct 17, 2018 · 39 comments
Closed

[1.8.0] "No active pools" with dozen of them #194

electroape opened this issue Oct 17, 2018 · 39 comments

Comments

@electroape
Copy link

I noticed smth like that with previous versions but with 1.8.0 this issue is on another level.

[2018-10-17 19:47:16] * POOL #1: $PROXY1$:3333 [2018-10-17 19:47:16] * POOL #2: $PROXY2$:3333 [2018-10-17 19:47:16] * POOL #3: $PROXY3$:3333 [2018-10-17 19:47:16] * POOL #4: de01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #5: fr01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #6: at01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #7: hk01.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #8: de02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #9: fr02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #10: at02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #11: hk02.supportxmr.com:3333 [2018-10-17 19:47:16] * POOL #12: pool.supportxmr.com:3333 [2018-10-17 19:47:16] * CC Server: $PROXY1$:3344 [2018-10-17 19:47:16] * COMMANDS: hashrate, pause, resume, quit [2018-10-17 19:47:16] Starting thread 1/3 affined to core: #0 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] Starting thread 3/3 affined to core: #1 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] Starting thread 2/3 affined to core: #2 -> huge pages: 1/1 scratchpad: 2.0 MB [2018-10-17 19:47:16] use pool $PROXY1$:3333 [2018-10-17 19:47:16] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 19:47:19] accepted (1/0) diff 3000 (1 ms) [2018-10-17 19:47:38] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 19:47:39] accepted (2/0) diff 3000 (3 ms) [2018-10-17 19:47:53] SIGHUP received, exiting [2018-10-17 19:47:53] no active pools, stop mining [2018-10-17 19:47:53] [$PROXY1$:3333] Error: "[Read] ќпераци¤ ввода/вывода была прервана из-за завершени¤ потока команд или по запросу приложени¤"

That is, on proxy restart miner stops mining entirely and doesnt resume. Strangely enough i have a few machines that stop mining too (no pools, stop mining) but resume when main poxy goes online.

[2018-10-17 17:49:09] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 17:49:10] new job from $PROXY1$:3333 with diff 3000 and PoW 1 [2018-10-17 17:49:18] [$PROXY2$:3333] Error: "[Connect] ������� ���������� ���������� ���� �����������, �.�. �� ������� ���������� �� ��������� ����� �� ������� ������ ������, ��� ���� ��������� ��� ������������� ���������� ��-�� ��������� ������� ��� ������������� ����������" [2018-10-17 17:49:19] [$PROXY1$:3333] Error: "[Read] End of file" [2018-10-17 17:49:19] no active pools, stop mining [2018-10-17 17:49:19] [$PROXY1$:3333] Error: "[Read] �������� �����/������ ���� �������� ��-�� ���������� ������ ������ ��� �� ������� ����������" [2018-10-17 17:49:20] use pool $PROXY1$:3333

@Bendr0id
Copy link
Owner

Sorry but not a single change on the network code since 1.6.x

I restart my proxy several times an hour, without issues. With such a fuckedup log file I can't help you.

@Bendr0id
Copy link
Owner

Maybe you're using newer boost/GCC/updated visual studio or other libs then on your old build.

@electroape
Copy link
Author

I don't know why log is pasted that way :) My locale is russian, that error is smth like "IO operation was cancelled"

@electroape
Copy link
Author

As i said, i've noticed miners stucking like that on previous builds. My environment isn't changed other than miners & cc server version. But i've restarted proxy earlier (on earlier versions) and it wasn't that reproductible.

@electroape
Copy link
Author

Falling back to older version for now and will try to reproduce it on english locale.

@djfinch
Copy link

djfinch commented Oct 17, 2018

It's Snippa's, MoneroOcean or some other fork? I saw this multiple times and it was always (in my case) proxy issue. I'm running multiple 1.8.0 miners atm and everything works well (MO proxy with some amendmens on rPi2/bionic/docker container).

@electroape
Copy link
Author

electroape commented Oct 17, 2018

It isn't proxy issue, miner have a pleny of backup pools, it just stops doing anything when encounters that error on main pool. But proxy is xmrig-proxy if that's somehow relevant ...

@Bendr0id
Copy link
Owner

Without a proper log I can't help you. But to be honest the only change in 1.8 is the algorithm. Nothing else. So often the problem exists it is in all versions since 1.6.

I would bet if I create a new build with just version incremened, there will be ppl saying the old one was better.

@electroape
Copy link
Author

Okay, i've reversed miners to 1.7.0 they're stopping mining when main proxy goes offline with the same error message but without "no pools, stop mining" and don't switch to backup pools either, but when main pool goes online they resume mining, that's difference with 1.8.0.

@electroape
Copy link
Author

I published log on pastebin > https://pastebin.com/tqEpdSkE

@Bendr0id
Copy link
Owner

When they lose connection to the proxy they have to stop. Because everything they mine is pure waste. When the pool connection is dropped, it will try again after some time.

How long you waited..?

@Bendr0id
Copy link
Owner

I don't see a problem on your log? It starts mining after a while

I still just see ?????? Btw in the log

@electroape
Copy link
Author

Ofcourse they stop when they lose connection to pool, but they don't switch to any backup pool either. There was probably 5 min window between when i restarted proxy and noticed that miners don't do anything. And miners config is 5 retres with 1 sec timeout

@electroape
Copy link
Author

Oh, forgot to mention, that's 1.7.0 behaviour, it resumes mining, 1.8.0 don't.

I'll get to home in a hour and try to reproduce it again with russian and english locales, thanks for attention.

@djfinch
Copy link

djfinch commented Oct 17, 2018

Sorry for pointing at proxy but log is literally unreadable... It might be something with boost but something similar was inside xmrig <2.5.3 where miner can't recover connection and also switch to failover pool and there is no boost used in original xmrig so...

@electroape
Copy link
Author

Full log > https://pastebin.com/vN0EV0e0

To trigger the bug you need to configure atleast 3 pools; pool #1 is online, then offline, pool #2 is unresolveable (either nonexistent domain name, or unresponding IP or closed port), pool #3 is fallback pool.

When miner first connects to pool #1 and then that pool goes offline - miner gets stuck at pool #2 and doesn't go further, suspending mining indifenitely (even when it configured with 1 retries and 1 second retry-pause). Note that if pool #1 is offline at the startup - miner succesfully connects to fallback pool #3 skipping unresolveable pool #2, but when pool #1 goes online and then offline it still gets stuck at #2.

Hope that helps.

@djfinch
Copy link

djfinch commented Oct 17, 2018

So... I'm able to partially replicate this...
Testing env:

Pool 1 : xmr-node-proxy (XNP)
Pool 2 : example.com (dummy)
Pool 3 : supportxmr
Result: NOT GOOD. XNP is killed --> miner stuck in loop trying to reach Pool 2 which does not exist and ignore Pool 3. Anyway, running proxy again will resurrect the miner.
image

Pool 1 : xmr-node-proxy
Pool 2 : supportxmr
Pool 3 : example.com (dummy)
Result: GOOD. XNP killed --> miner switch to Pool2. XNP is started --> miner will connect back to proxy. That's expected behavior.
image

2nd test should work for you, too...
I saw in your log that proxy pool was first and supportxmr 2nd. You successfully got multiple jobs from supportxmr so it works! However, I think your 25H/s miner is not able to hash 5000diff which is min-diff there.

And how is possible that restarting proxy does not help and force miner to connect back? I don't know. Maybe it's stuck and occupying port but I cannot reproduce this. Anyway, in case that Pool 1 is available again, miner will switch. Everytime. Even if 2nd pool is dummy. Proofs are above... So, 1st example (dummy pool is 2nd) is definitively an issue. 2nd example (2nd pool is working and dummy is 3rd) should work everywhere and your issue can be caused by env, boost, compiler, proxy, occupied port, weather, I don't know.

@electroape
Copy link
Author

Thanks for tests. Your 2nd test will not have this bug, i didn't test that case either.

As for resuming mining after main pool goes offline and online again, i actually forgot to test that, was too focued on issue that i was seeing even on older versions (tested above). Will test that case now.

@Bendr0id
Copy link
Owner

I'll record tomorrow some videos. But what I tested so far, 1.7 is behaving exactly the same way 1.8 does.

So this has to be a coincidence

@electroape
Copy link
Author

electroape commented Oct 17, 2018

Silly me. The reason why i created this ticket is that after proxy restart a major part of my miners is gone from dashboard so i assumed they stopped mining, but that actually means that either they're crashed or cannot connect \ refused to connect by CC server. I'll try to reproduce this now.

So there's actually 2 separate bugs. One is indefinite loop on unresolveable pool, miner just stops mining, this bug was here for atleast two releases prior 1.8.0 i just wasn't bothered enough to investigate it until now. Second one is still under question.

@electroape
Copy link
Author

Okay, i upgraded to 1.8.0 again and restarted proxy and CC server several times, it's not reproducting. So there's only one bug for now - miner can't fallback to pools down the list after unresolveable one.

I'll record tomorrow some videos. But what I tested so far, 1.7 is behaving exactly the same way 1.8 does.

Exacly, this bug was here in 1.7.0 too :) Tell us if you still will not be able to reproduce it.

@Bendr0id
Copy link
Owner

I have same results like @djfinch.

You're the one who said it was working on 1.7, I always said that 1.7 does the same like 1.8 and that's still valid.

@electroape
Copy link
Author

I didn't quite get it, you don't consider @djfinch test #1 behaviour a bug ?

As of second bug (disappearing of miners from dashboard of unknown reason) i will watch closely and i'll report if i will stumble upon it again, i've had disabled per miner logs so i can't see what was going on client side, i've enabled them now and it will be easier to see what's going on.

@Bendr0id
Copy link
Owner

I don't know what's misleading..

"I have same results like @djfinch."

In other words "I see the issue he was able to reproduce too". I was able to reproduce that.

But this is still a corner case. But I will look into it.

Why the hell arent you fixing your proxies? Maybe you have DNS issues? I restart my proxies 50 times a day when hopping coins. And I don't have these issues...

@electroape
Copy link
Author

Okay, got it.

That pools config is kinda legacy workaround, my gateway was acting weird and i added alternative ip aswell, i don't remember exacly what happened.

Nevermind, i'll remove these unresolveable pools for now as workaround and report if there'll be problems with restarting proxy/CC.

PS: 50 times a day? Holy cow... I'm not that dedicated :) Btw, how's progress with proxy integration?

@Bendr0id
Copy link
Owner

Can you please test this branch

https://github.com/Bendr0id/xmrigCC/tree/proper_handling_of_dns_issues

The described cases from @djfinch work now. DNS issues are now handles like normal connection errors, and jumps to the next pool. And keeps retrying, once the main pool is back again it jumps back to it.

@Bendr0id
Copy link
Owner

@uz-spark tested?

@electroape
Copy link
Author

Uh, sorry, not fixed.

https://pastebin.com/JKFq435K

@Bendr0id
Copy link
Owner

Bendr0id commented Oct 20, 2018

Why? It looks perfect, after 5 attemps it always tries to conncet to the next one and keeps trying the others until it is able to connect to one, then it stops retrying. And at the end it is connected to one of your fallbacks.

[2018-10-20 19:11:57] [inexistant_domain_name:6666] Error: "[Connect] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
[2018-10-20 19:11:57] use pool de01.supportxmr.com:3333
[2018-10-20 19:11:57] new job from de01.supportxmr.com:3333 with diff 5000 and PoW 2
[2018-10-20 19:12:00] [$MY_PROXY$:6666] timeout

But it will always try to connect to your primary server. Once it is connected to it, it will jump again to it. All that is the expected behavior.

[2018-10-20 19:13:40] new job from $MY_PROXY$:6666 with diff 3000 and PoW 2

Cant see an issue here.

@electroape
Copy link
Author

Why? It looks perfect, after 5 attemps it always tries to conncet to the next one and keeps trying the others until it is able to connect to one, then it stops retrying. And at the end it is connected to one of your fallbacks. But it will always try to connect to your primary server. All that is the expected behavior.

[2018-10-20 19:13:40] new job from $MY_PROXY$:6666 with diff 3000 and PoW 2

Cant see an issue here.

I started proxy back to see if it will connect to it. It's main proxy, first in the pool list.

I've retested it with clearer settings. Hashrate is low but difficulty is 100 and despite there's some hashrate even after 'no pools, stop mining', you can see that it doesn't submit anything to failover proxy.

https://pastebin.com/Zy0sLgnT

@Bendr0id
Copy link
Owner

I tested a lot cases and it was always recovering, at least when the main proxy is back. In you log file, there was phase where no fallback was responsive. I dont know if thats a real usecase.

@electroape
Copy link
Author

I tested a lot cases and it was always recovering, at least when the main proxy is back. In you log file, there was phase where no fallback was responsive. I dont know if thats a real usecase.

In that case failover pool is my proxy on the same machine as main proxy, just on other port, i don't know why it was unresponsive for a second, maybe host was too busy (it's on VM). But i don't see how that's not real usecase, i just replaced regular pool for my proxy to lower difficulty so i can see hashrate cleaner.

@electroape
Copy link
Author

I mean, i think that's pretty normal usecase :

Pool #1 - your main proxy
Pool #2 - your failover proxy on another host, if you decide to maitenance on host with main proxy you will online your failover proxy, otherwise it not needed and offline
Pool #3 - regular pool, just in case both your proxies is offline

@electroape
Copy link
Author

I'll do a test with config in default settings as far as possible to see if that's issue, i don't know why you can't reproduce that.

@Bendr0id
Copy link
Owner

https://pastebin.com/ZiNuptfc

Case 1:
First i turned off my main proxy, jumping to second proxy after 5 attemps. Turning on first proxy, jumps back to main proxy.

Case 2:
First i turned off my main proxy, jumping to second proxy after 5 attemps. Turning off second proxy, miner jumps to fallback. Turn of main proxy, jumps back to main proxy.

@Bendr0id
Copy link
Owner

https://pastebin.com/zMatn4yH

Case 3:
Main proxy down, 2nd proxy down, jumping to fallback. Turning on first proxy, jumps back to main proxy

@Bendr0id
Copy link
Owner

Btw, same counts when 2nd proxy has bad dns. Just tested it.

@electroape
Copy link
Author

Here, i retested with almost default config and both proxies and miner on the same machine.

https://pastebin.com/0Gxsv4S9

In yourtest you don't reproduce my steps. I'll describe it again in detail :

Pools :

  1. your main pool - it must be online and go offline at some point
  2. your failover pool - it's offline because it's failover (huh) and you don't need it for now, atleast it's my usecase
  3. any other failover pool

Steps :

  1. If pool Fixed windows build #1 is online on miner start - miner succesfully connects to pool Integrate xmrig-nvidia and similar into dashboard/miner control #3 skipping through unresponsible pool Static linking of uvlib #2.
  2. Pool Fixed windows build #1 goes offline, miner loops trying to connect to pools Fixed windows build #1 and Static linking of uvlib #2 and doesn't check pool Integrate xmrig-nvidia and similar into dashboard/miner control #3 at all

What you're missing in your test is that pool #2 must be offline when pool #1 goes offline otherwise even if then pool #2 goes offline too - miner succesfully connects to pool #3 because pool #1 already unresponsive and only one thing it can do is to try to connect to other pools down the list, pool #3 in that case.

@electroape
Copy link
Author

So, was you able to confirm this or not ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants