Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVF: Throw error if worker process cannot spawn necessary threads #659

Open
mrcnski opened this issue Apr 10, 2023 · 4 comments
Open

PVF: Throw error if worker process cannot spawn necessary threads #659

mrcnski opened this issue Apr 10, 2023 · 4 comments
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be.

Comments

@mrcnski
Copy link
Contributor

mrcnski commented Apr 10, 2023

ISSUE

Overview

This issue is about the following quote from tokio::task::spawn_blocking (used by us to create threads in the worker process):

Tokio will spawn more blocking threads when they are requested through this function until the upper limit configured on the Builder is reached. After reaching the upper limit, the tasks are put in a queue. The thread limit is very large by default, because spawn_blocking is often used for various kinds of IO operations that cannot be performed asynchronously. When you run CPU-bound code using spawn_blocking, you should keep this large upper limit in mind.

Why is this an issue? Because for deterministic results we require that all worker threads are spawned at the same time, and not staggered. For example, if the CPU time monitor thread is late to start, we may timeout later than other validators, and accept candidates that others do not. (Also, we may at some point start using the measurements from the memory stats thread to reject PVFs, though right now this is preparation-only so inaccurate measurements cannot lead to disputes.)

Proposal

If we cannot spawn all the threads immediately, we should return an internal error (i.e. not dispute). This would lead to a retry (since paritytech/polkadot#7011).

That said, as the default limit in tokio is very high, it is not clear how it can ever be reached in practice. Each execution job uses 2 threads, prepare jobs use 3. So I would say this is low priority, but something to keep in mind.

@jpserrat
Copy link
Contributor

jpserrat commented Aug 7, 2023

Hey @mrcnski, I'm going to start working on this one now. I'm probably gonna take a little longer just to get the context on this one haha. Let me know if you have any suggestions.

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 7, 2023

Hey @jpserrat! As there might be a refactor of relevant code coming soon with this issue, I would suggest waiting on this one right now. If you want, you can try to think about how to approach this in the meantime - I would suggest familiarizing yourself with the current code that deals with threads and learning about Rust's synchronization primitives, that might give you some ideas ;) Otherwise I can help you select a different issue. Up to you!

@jpserrat
Copy link
Contributor

jpserrat commented Aug 7, 2023

I'm going to follow your advice regarding the current code and Rust threads. Thank you very much for your guidance. It would also be great if you could recommend a different one.

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 8, 2023

@jpserrat Awesome! This one would be pretty quick and easy: #695. It may not end up being needed, but I think it probably will, and definitely wouldn't hurt to have. You can even add a sanity-check before each execution to make sure that the current node version matches the version recorded in the filename.

This is another easy one: #622

I found a couple more but they are somewhat non-trivial, would be a big help if you dove into them though:

  1. https://github.com/paritytech/polkadot/issues/7042
  2. Preparation errors should no longer be a InvalidCandidate variant #733

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. T8-parachains_engineering and removed J7-mentor labels Aug 25, 2023
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
…h#637)

* WIP: attempt at new data structures

* Specify path to config trait

* Check AccountId's traits for EnqueuedMessage

* Reorder MessageBundle type params

* Add trait bounds to AccountId

* Don't store AccountId in MessageBundle messages

The AccountId is already in the MessageBundle, so no need to add it to
every message.

* Ensure that AccountId implements MaxEncodedLen

* Finish rename to Nonces

* Whitespace

* Remove use of principal

* Fix struct access in average_payload_size

* WIP Collect MessageBundles partitioned by AccountId

* Convert message queue to map of message bundles

* Fix account id used for nonces in test

* Fix warnings

* Convert message bundles to Vec

* Remove last uses of principal in basic channel

* Found a better way to handle this

Just use BoundedVec

* Insert spaces to help with Markdown folding

* Fix BTreeMap import

* Revert "Remove last uses of principal in basic channel"

This reverts commit 2f5d6e07a56bf74baeaf6f0421b68beb77b17644.

* Fix some benchmarking types

* Remove last uses of principal in basic channel

Leave the set_principal call in place as a no-op to avoid removing the
config for Call. We'll re-add it when we add the leaf node proof anyway.

* First stab at Merklizing all message bundles

I mean, if it compiles... 🤷

* Make keccak256 public to try resolve import issues

* Use configured Hash type

* rustfmt

* Fix tests in merkle_proof

* Fix nonce update

* Remove not_authorized test

* Add TODO

* Ignore large tests

* rustfmt

* Store all message bundles in the event

* Clean up commit() a bit

* Extract function make_message_bundles

* Eth-encode message bundles before merklizing them

* Update comments

* Remove unused function

* Persist ABI-encoded message bundles

* Use SCALE encoding for off-chain message bundles

* Fix debug import

* Switch to Token::FixedBytes

* s/map/for/

* Remove loop implementation

* Remove generic output type

* rustfmt

* Add StoredLeaves struct to communicate intent

* WIP Custom RPC and runtime API

* fixes (paritytech#647)

* Update snow{blink,bridge} with fixes

* Move merkle_proof to a separate crate

* Add RPC call stub

* Add runtime API implementation

* Remove basic channel's RPC runtime API

* WIP SCALE-encode Merkle proof from RPC call

* Move StoredLeaves struct into primitives crate

* Disable default features in basic channel crates

* Add import for function

* Use u64 instead of usize for proofs

* Fix missed issues from removing the runtime API

* Fix merkle-proof tests

* Update basic outbound channel's tests for commit()

* rustfmt

* Clean up RPC call

* Fix typo & remove unused import & comment

* Replace encode with as_ref for ethabi encoding

* Remove AsRef implementation for MessageBundle

* Fix up RPC crate

* Remove unused dependencies in basic channel

* Remove redundant package names

* Use import alias

* Reorder rpc pallet

* rustfmt

* Mention outbound in Merkle proof RPC

* Remove unused error

* Remove set_principal from basic outbound channel

* Update docstring for commit()

* Ignore unused result

* Shorten type

* Remove unused dependency

* Add prefix to incentivized channel in runtimes

* Remove unused import

* Remove primitives crate

* Add V2 of basic inbound channel

* Fix typos

* Re-order message types

* Fix use of nonces

* Generate commitment from message bundle & proof

* Fix contract name

* Generate basic inbound channel v2 go bindings

* Fix RPC name

* Fix rpc handler (paritytech#659)

* Detect leaf index out of bounds in RPC

* Add tests for RPC handler

* Overwrite BasicInboundChannel instead of using V2

* Link MerkleProof library for testing

* Typos

* Add account id filter to parachain relayer

* Add MerkleProof library to channel deployment

* Add MerkleProof RPC to parachain relayer

* Extract account id decoding to getter

* Simplify search for matching bundle

* Forward Merkle proof to inbound channel contract

* Fix mapstructure tag

* Move decoded account id to listener struct

* Tweak error messages

* Extract finding bundle

* Extract fetchBundleProof

* Use existing parachain connection

* Fix account id decoding

* Fetch basic channel nonce by accountID

* Reuse accountID stored on startup

* Fix listener decoding & writer bounds check

* Add account id to message bundle log

* Remove TODO

This is only relevant if we expect to have around 2**63 different
accounts with messages in the same commitment.

* Add RPC name to error messages

* Use the Alice account id as default for ACCOUNT_ID

* Mention build-essential for setting up lodestar

* Fix log file names

* Log leafProof and hashSides in relayer

* Fix hash side value

Should be hashed on the left when the side is true, which is when the
index is even.

* Use same names and param order as contract

* Update contract fixture data

* Revert "Fix hash side value"

This reverts commit ec0b2e9aa7ee2a8605f68152392983e344f0057f.

* Add reminder to use a port of a relay chain node

* Fix basic inbound channel contract tests

* Mention port forwarding in port reminder

* Replace pointer with byte array

* Improve error logs

* Return proof value instead of mutating pointer

* Add second message bundle to basic channel test

* WIP Add second account to basic channel e2e test

* Start accountID with lowercase

* Update new dependencies to 0.9.25

* Fix RPC handler tests

* Update parachain metadata

* Allow multiple accountIDs in parachain relayer

* Fix scanning logic

* Update basic nonces log

* Fix initial len of nonces slice

* Fix jq for multiple account ids

* Add bob key for E2E test

* Check correct colletion's length to stop scanning

* Fix param description

* Whitespace

* Ignore local asdf versions

* Replace base 2 log with simpler bit calculation

* Remove resolved TODOs

* Typo

* Add comment explaining ACCOUNT_IDS

* s/accountIDs/accounts/g

* Add issue key to TODO

* Typo

* Add geth version to README

* Fix templating

* Update parachain lockfile

* Fix recipient address

* Fix dotapp tests after rename

Co-authored-by: Vincent Geddes <[email protected]>
Co-authored-by: Vincent Geddes <[email protected]>
bkchr pushed a commit that referenced this issue Apr 10, 2024
* fixed messages count check

* explicit check of `messages_count` in the receive_messages_proof

* change messages_count to be u32

* Update modules/message-lane/src/lib.rs

Co-authored-by: Hernando Castano <[email protected]>

Co-authored-by: Hernando Castano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be.
Projects
Status: Backlog
Development

No branches or pull requests

3 participants