Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ms.mempool locking #9050

Merged
merged 92 commits into from
Nov 4, 2021
Merged

Ms.mempool locking #9050

merged 92 commits into from
Nov 4, 2021

Conversation

mariano54
Copy link
Contributor

@mariano54 mariano54 commented Oct 31, 2021

Many improvements with two goals:

  • Improve block propagation and validation times, to make sure everyone can farm at all times, and blockchain health does not deteriorate in times of load,
  • Improve how many transactions per second each node can accept, especially for low end hardware.

Improvements

  1. The Process pool for validating transactions has been increased from 1 to 2, to better utilize idle cores.
  2. Validation of signatures for new transactions now happens in a different process (in the process pool where CLVM was getting validated). This means we can do more TPS and free the main thread for block validation.
  3. For unfinished block validation, CLVM and signatures are validated outside the blockchain lock (and clvm in the process pool executor)
  4. The mempool validation (which reads coin store) now uses the same lock as block validation (blockchain.lock). This means that block validation gets full priority of using IO and the main thread CPU.
  5. The blockchain lock has been changed to use a LockQueue, where different clients can use different lock priorities. Block validation gets high priority, and mempool low, so that block validation is always prioritized.
  6. A transaction queue is added so that we don't have to keep many long running asyncio tasks from the server, and can keep things in the queue for a while until the node has some free time to validate them. Things are pulled from the cache 200 at a time (with the semaphore) so that we can efficiently validate transactions in different processes. However, note that once a transaction has been selected from the queue, it still might need to wait for the blockchain lock later on. Before, with no queue, transactions would all try to get validated at once, even if the node could not handle them.
  7. respond_transction in the full node protocol now returns instantly and puts TX in the queue.
  8. send_transaction in the wallet protocol maintains it's behavior, but internally pushes transactions into the queue and waits for a result before returning.
  9. CancelledException used to put the node into an invalid state, now it does not; we ensure to call peak_post_processing when tasks are cancelled.
  10. peak_post_processing has been divided into two methods, peak_post_processing and peak_post_processing_2. The first one must be called under the blockchain lock, since it modifies full node state. The second one can be called outside the lock, and it deals with sending blocks and messages to other peers and wallets. Before, they were both called inside the lock.
  11. When the mempool receives a new block, it must go through the whole mempool and re-check all transactions to see if they are still valid. WIth a new optimization, we now only check to see if the removals were removed in the block, if not, the spendbundle must be valid, so we don't check it.
  12. New peaks can also stack up on each other in times of high load, so we limit the number of pending messages, and discard the rest. Also, only 2 run in parallel instead of 8, leading to faster catch-up times in RPI.
  13. When an API times out, we no longer close the connection. This was making slow peers get disconnected in times of high load.
  14. Reduce the number of peers we re-ask for a transaction when the first peer is unresponsive, because we might drop some transactions due to load, so we don't want to keep asking for them.
  15. The pairing cache has been increased from 10k to 50k for more cache hits.
  16. Lazy pairing cache subgroup check to save ~1.3 seconds per block validation on RPI

@mariano54 mariano54 requested a review from altendky October 31, 2021 20:08
@mariano54 mariano54 marked this pull request as draft October 31, 2021 20:08
@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2021

This pull request introduces 3 alerts when merging 01f3688 into 3df0f9e - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import
  • 1 for Incomplete ordering

@lgtm-com
Copy link

lgtm-com bot commented Nov 1, 2021

This pull request introduces 1 alert when merging bf4a33d into 3df0f9e - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 1, 2021

This pull request introduces 1 alert when merging c44e676 into 3df0f9e - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 1, 2021

This pull request introduces 1 alert when merging e5ee15a into 3df0f9e - view on LGTM.com

new alerts:

  • 1 for Unused local variable

chia/full_node/full_node.py Outdated Show resolved Hide resolved
chia/full_node/full_node.py Outdated Show resolved Hide resolved
chia/full_node/full_node_api.py Outdated Show resolved Hide resolved
chia/full_node/full_node_api.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Nov 3, 2021

This pull request introduces 1 alert when merging 516b7cf into eb62a28 - view on LGTM.com

new alerts:

  • 1 for Unused import

@mariano54 mariano54 marked this pull request as ready for review November 3, 2021 23:09

f = getattr(self.api, message_type, None)
if len(message_types) % 100 == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are changes in chia/server/server.py just logging every 100th message per type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are logging the current state (which API tasks are still running) for debugging purposes

@@ -493,12 +495,11 @@ def connection_closed(self, connection: WSChiaConnection, ban_time: int):
f"Invalid connection type for connection {connection.peer_host},"
f" while closing. Handshake never finished."
)
self.cancel_tasks_from_peer(connection.peer_node_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there some race going on here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it got moved before on disconnect)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No i just did that as a precaution, i was scared the stuff below it might throw a exception

@@ -484,30 +513,52 @@ async def new_peak(self, new_peak: Optional[BlockRecord]) -> List[Tuple[SpendBun
return []
assert new_peak.timestamp is not None

use_optimization: bool = self.peak is not None and new_peak.prev_transaction_block_hash == self.peak.header_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed, I though changes are tracked across batch updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well after sync we call this method, but we dont have all of the coin changes since the start of the sync. We could instead clear the mepool after the sync.

Copy link
Contributor

@wjblanke wjblanke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aok

@wjblanke wjblanke merged commit 8a028c3 into main Nov 4, 2021
@wjblanke wjblanke deleted the ms.mempool_locking branch November 4, 2021 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants