-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tortoise: support multiple smeshers #5087
Labels
Comments
6 tasks
bors bot
pushed a commit
that referenced
this issue
Oct 12, 2023
…5130) related: #5113 it drops RefBallot function, and will allow to drop that index in a followup related: #5106 it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs related: #5087 refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
bors bot
pushed a commit
that referenced
this issue
Oct 12, 2023
…5130) related: #5113 it drops RefBallot function, and will allow to drop that index in a followup related: #5106 it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs related: #5087 refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
dshulyak
added a commit
to dshulyak/go-spacemesh
that referenced
this issue
Oct 13, 2023
…pacemeshos#5130) related: spacemeshos#5113 it drops RefBallot function, and will allow to drop that index in a followup related: spacemeshos#5106 it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs related: spacemeshos#5087 refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
bors bot
pushed a commit
that referenced
this issue
Oct 16, 2023
closes: #5087 data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel. serial parts: - loading share data (beacon and active set) - deciding on mesh hash - tally votes & encode votes tortoise calls parallel parts: - loading data (it can be also run serially, but it was convenient to run it in parallel) - computing eligibilities (this is done once per node startup) - selecting txs - publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot
pushed a commit
that referenced
this issue
Oct 16, 2023
closes: #5087 data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel. serial parts: - loading share data (beacon and active set) - deciding on mesh hash - tally votes & encode votes tortoise calls parallel parts: - loading data (it can be also run serially, but it was convenient to run it in parallel) - computing eligibilities (this is done once per node startup) - selecting txs - publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot
pushed a commit
that referenced
this issue
Oct 16, 2023
closes: #5087 data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel. serial parts: - loading share data (beacon and active set) - deciding on mesh hash - tally votes & encode votes tortoise calls parallel parts: - loading data (it can be also run serially, but it was convenient to run it in parallel) - computing eligibilities (this is done once per node startup) - selecting txs - publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot
pushed a commit
that referenced
this issue
Oct 16, 2023
closes: #5087 data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel. serial parts: - loading share data (beacon and active set) - deciding on mesh hash - tally votes & encode votes tortoise calls parallel parts: - loading data (it can be also run serially, but it was convenient to run it in parallel) - computing eligibilities (this is done once per node startup) - selecting txs - publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
poszu
pushed a commit
that referenced
this issue
Oct 17, 2023
closes: #5087 data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel. serial parts: - loading share data (beacon and active set) - deciding on mesh hash - tally votes & encode votes tortoise calls parallel parts: - loading data (it can be also run serially, but it was convenient to run it in parallel) - computing eligibilities (this is done once per node startup) - selecting txs - publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
consensus state is isolated in the
tortoise
module.all changes related to multiple smeshers support should be in in
miner
module.for consistent interface for registration please see #5085
computing active set and eligibilities
activeset should be prepared only once per epoch, regardless the number of registered smeshers.
need to be careful not to create multiple parallel readers where each one will try to prepare its own activeset.
data structure should reflect that we store only one copy of activeset. one way to implement it would be to check on startup if any of the registered smeshers already created activeset, if so then we cache that activeset. otherwise code prepares activeset before going into parallel part.
the other non-parallel part is computing tortoise.EncodeVotes only once per layer, regardless of the registered smeshers.
the parallel part consists of preparing eligibility cache per smesher, selecting txs for proposals and signing transactions.
oracle refactoring
oracle should not store any state. no signers, no eligibilities.
all shared state should be stored on miner instance, the structure should reflect what we compute in parallel and what not.
skipping vrf/sig validation on self-publish
unnecessary work, without this refactoring will require parallelization as well
The text was updated successfully, but these errors were encountered: