Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate limit for re-announcements #1131

Merged
merged 2 commits into from
Feb 1, 2023

Conversation

nazar-pc
Copy link
Member

@nazar-pc nazar-pc commented Feb 1, 2023

The problem

With large number of incoming announcements farmer may not be able to process them all and re-announce pieces in real time. This is especially true before cache is full where all announced pieces will fit into cache.

As the result we were getting following warnings:

2023-01-31T17:25:10.644120Z  WARN subspace_farmer::commands::farm::dsn: Failed to add provider record to the channel. record.key=Key(b"\xf8\xb9\xce\x05 r\xc9BLo9\xffZ\xb1\xa5\x929yj&0c\xa0Ta\xdbw\xca\xd9\xc4\xf4\xd0\xf4\x01w@\xbf") record.provider=PeerId("12D3KooWBTS6fNhWg79UZPZ9EEbGXsuxX2ptBt6Th2ovxkx24nw1")

The solution

While #1129 will solve this long-term, in the meantime this PR distinguishes difference processing phases:

  • initial checks, attempt to pull the piece from announcer are happening as before
  • re-announcement happens in the background task with its own limit

The rationale here is that as long as we pull and persist piece in the cache, it is still retrievable (because farmer did store it and the reason they did it is because they are close enough to the key), even if short-term we were not able to re-announce it.

The only negative effects when this happens are that not everyone on the network will be aware of all the nodes that have data and that may cause extra strain on nodes closer to the key.

Code contributor checklist:

@nazar-pc nazar-pc added farmer Farming library/app networking Subspace networking (DSN) performance Related to performance measurement or improvement labels Feb 1, 2023
@nazar-pc nazar-pc added this to the Gemini 3 milestone Feb 1, 2023
@nazar-pc nazar-pc requested review from i1i1 and rg3l3dr as code owners February 1, 2023 03:28
Copy link
Contributor

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems blocked by #1133

Pieces that were added to the local cache and with skipped announcements won't be visible outside.

@nazar-pc
Copy link
Member Author

nazar-pc commented Feb 1, 2023

This PR doesn't decrease the queue size, in fact it adds even more buffering on top, so it is strictly better than what we had before. Though #1133 will certainly improve things further.

@shamil-gadelshin
Copy link
Contributor

This PR doesn't decrease the queue size, in fact it adds even more buffering on top, so it is strictly better than what we had before. Though #1133 will certainly improve things further.

Isn't it possible to proceed with an external task (record processing) even if we have all the internal tasks (announcing) busy? This will make saving to the local cache unnecessary because of the no future effect.

We need to acquire reannouncing semaphore first to make it more strict.

@nazar-pc
Copy link
Member Author

nazar-pc commented Feb 1, 2023

Isn't it possible to proceed with an external task (record processing) even if we have all the internal tasks (announcing) busy?

Certainly possible

This will make saving to the local cache unnecessary because of the no future effect.

It will not have effect without #1133, but I see no reason why #1133 blocks this PR and makes it less useful. We already populate cache, which is great and will be beneficial.

We need to acquire reannouncing semaphore first to make it more strict.

I don't get this part, caching should happen regardless of whether we can re-announce afterwards or not.

@shamil-gadelshin
Copy link
Contributor

We don't need a cache unless we serve requests and there will be no requests unless we announce pieces. This PR in its current form introduces a situation when we have a piece in the cache but don't announce them (now or ever). We could mitigate it by rearranging semaphores. We could either implement #1133 and skip some of the announcements because they will be reannounced later.

@nazar-pc
Copy link
Member Author

nazar-pc commented Feb 1, 2023

there will be no requests unless we announce pieces

There will be requests in general, but we'll not be able to return node's info until #1133 is implemented, though I don't see how that reduces value of this PR and why it can't be merged first. It literally physically can't make things any worse than they are right now only better.

We could mitigate it by rearranging semaphores. We could either implement #1133 and skip some of the announcements because they will be reannounced later.

I don't understand why you want to re-arrange semaphores, sorry. With or without #1133 it doesn't make sense to me.

Copy link
Contributor

@i1i1 i1i1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shamil-gadelshin
Copy link
Contributor

Currently, we have:
semA, semB, taskA, taskB

check semA
	run taskA
		check semB
			run taskB

Since tasks are processed in separate threads (at least logical threads) - they don't block each other even having taskB is started within taskA.
That means that it's possible to run taskA and don't run taskB (it seems you agree with this possibility).

If we rearrange semaphores

check semB
check semA
	run taskA
		run taskB

We make taskA dependent on the taskB being runnable. That will mean that starting taskA will be always accompanied by the starting taskB.
Back to this PR - rearranging semaphores will guarantee that a piece will be both in the cache and announced. Without it - we will have some useless pieces in the cache.

OR we could fix #1133 and leave it as it is because we will have a different side effect - we could skip announcing pieces sometimes but they will be reannounced later.

@nazar-pc
Copy link
Member Author

nazar-pc commented Feb 1, 2023

Without it - we will have some useless pieces in the cache.

But that is the whole point of this PR: To cache pieces even if they can't be announced. With #1133 pieces that we have stored in the past will suddenly become accessible after farmer restart with new software.

OR we could fix #1133 and leave it as it is because we will have a different side effect - we could skip announcing pieces sometimes but they will be reannounced later.

Not could, we will. Moreover, we'll do it today.

@shamil-gadelshin
Copy link
Contributor

Without it - we will have some useless pieces in the cache.

But that is the whole point of this PR: To cache pieces even if they can't be announced. With #1133 pieces that we have stored in the past will suddenly become accessible after farmer restart with new software.

OR we could fix #1133 and leave it as it is because we will have a different side effect - we could skip announcing pieces sometimes but they will be reannounced later.

Not could, we will. Moreover, we'll do it today.

Ok. I can review it having this perspective in mind.

Copy link
Contributor

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I tested it and found one minor issue: announcing pieces logs an invalid message Piece publishing for a sector succeeded.... It's incorrect in this context. However, it wasn't introduced in the current PR.

@nazar-pc
Copy link
Member Author

nazar-pc commented Feb 1, 2023

I tested it and found one minor issue: announcing pieces logs an invalid message Piece publishing for a sector succeeded.... It's incorrect in this context. However, it wasn't introduced in the current PR.

Yeah, it was copy-pasted and needs to be made more generic indeed.

@nazar-pc nazar-pc enabled auto-merge February 1, 2023 09:42
@nazar-pc nazar-pc merged commit 10e23e0 into main Feb 1, 2023
@nazar-pc nazar-pc deleted the separate-limit-for-re-announcements branch February 1, 2023 10:08
@nazar-pc nazar-pc mentioned this pull request Feb 2, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
farmer Farming library/app networking Subspace networking (DSN) performance Related to performance measurement or improvement
Projects
Development

Successfully merging this pull request may close these issues.

3 participants