Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tortoise: support multiple smeshers #5087

Closed
Tracked by #261
dshulyak opened this issue Sep 26, 2023 · 0 comments
Closed
Tracked by #261

tortoise: support multiple smeshers #5087

dshulyak opened this issue Sep 26, 2023 · 0 comments

Comments

@dshulyak
Copy link
Contributor

consensus state is isolated in the tortoise module.
all changes related to multiple smeshers support should be in in miner module.

for consistent interface for registration please see #5085

computing active set and eligibilities

activeset should be prepared only once per epoch, regardless the number of registered smeshers.
need to be careful not to create multiple parallel readers where each one will try to prepare its own activeset.

data structure should reflect that we store only one copy of activeset. one way to implement it would be to check on startup if any of the registered smeshers already created activeset, if so then we cache that activeset. otherwise code prepares activeset before going into parallel part.

the other non-parallel part is computing tortoise.EncodeVotes only once per layer, regardless of the registered smeshers.

the parallel part consists of preparing eligibility cache per smesher, selecting txs for proposals and signing transactions.

oracle refactoring

oracle should not store any state. no signers, no eligibilities.
all shared state should be stored on miner instance, the structure should reflect what we compute in parallel and what not.

skipping vrf/sig validation on self-publish

unnecessary work, without this refactoring will require parallelization as well

@dshulyak dshulyak moved this to 📋 Backlog in Dev team kanban Sep 26, 2023
@dshulyak dshulyak moved this from 📋 Backlog to 🔖 Next in Dev team kanban Sep 27, 2023
@dshulyak dshulyak self-assigned this Oct 5, 2023
@dshulyak dshulyak moved this from 🔖 Next to 🏗 Doing in Dev team kanban Oct 5, 2023
bors bot pushed a commit that referenced this issue Oct 12, 2023
…5130)

related: #5113 

it drops RefBallot function, and will allow to drop that index in a followup

related: #5106 

it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs

related: #5087 

refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
bors bot pushed a commit that referenced this issue Oct 12, 2023
…5130)

related: #5113 

it drops RefBallot function, and will allow to drop that index in a followup

related: #5106 

it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs

related: #5087 

refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
dshulyak added a commit to dshulyak/go-spacemesh that referenced this issue Oct 13, 2023
…pacemeshos#5130)

related: spacemeshos#5113 

it drops RefBallot function, and will allow to drop that index in a followup

related: spacemeshos#5106 

it eliminates repetitive disk reads and makes potentially expensive calls more transparent. added latency of execution to the logs

related: spacemeshos#5087 

refactoring to draw a line between per-smesher data that needs to be loaded once per epoch, and calls to external components. tortoise/mesh hash are reusable for every smesher, get txs is not reusable. it makes adding support for multiple smeshers significantly simpler
bors bot pushed a commit that referenced this issue Oct 16, 2023
closes: #5087

data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. 

build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel.

serial parts:
- loading share data (beacon and active set)
- deciding on mesh hash
- tally votes & encode votes tortoise calls

parallel parts:
- loading data (it can be also run serially, but it was convenient to run it in parallel)
- computing eligibilities (this is done once per node startup)
- selecting txs
- publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation

worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot pushed a commit that referenced this issue Oct 16, 2023
closes: #5087

data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. 

build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel.

serial parts:
- loading share data (beacon and active set)
- deciding on mesh hash
- tally votes & encode votes tortoise calls

parallel parts:
- loading data (it can be also run serially, but it was convenient to run it in parallel)
- computing eligibilities (this is done once per node startup)
- selecting txs
- publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation

worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot pushed a commit that referenced this issue Oct 16, 2023
closes: #5087

data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. 

build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel.

serial parts:
- loading share data (beacon and active set)
- deciding on mesh hash
- tally votes & encode votes tortoise calls

parallel parts:
- loading data (it can be also run serially, but it was convenient to run it in parallel)
- computing eligibilities (this is done once per node startup)
- selecting txs
- publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation

worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
bors bot pushed a commit that referenced this issue Oct 16, 2023
closes: #5087

data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot. 

build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel.

serial parts:
- loading share data (beacon and active set)
- deciding on mesh hash
- tally votes & encode votes tortoise calls

parallel parts:
- loading data (it can be also run serially, but it was convenient to run it in parallel)
- computing eligibilities (this is done once per node startup)
- selecting txs
- publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation

worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
@bors bors bot closed this as completed in b545097 Oct 16, 2023
@github-project-automation github-project-automation bot moved this from 🏗 Doing to ✅ Done in Dev team kanban Oct 16, 2023
poszu pushed a commit that referenced this issue Oct 17, 2023
closes: #5087

data is separated into shared data (beacon and active set) and signer specific data. both beacon and activeset are used from shared data, until smeshers generated a reference ballot. once any smesher generated a ballot it will be using data recorded in the reference ballot.

build method now loops over all signers (copied at the start of the layer). there are parts that have to be run once for all signers and parts that makes sense to run in parallel.

serial parts:
- loading share data (beacon and active set)
- deciding on mesh hash
- tally votes & encode votes tortoise calls

parallel parts:
- loading data (it can be also run serially, but it was convenient to run it in parallel)
- computing eligibilities (this is done once per node startup)
- selecting txs
- publishing proposal. this is the most important to avoid blocking serially in Publish while it runs validation

worker pool (errgroup) is limited by number of cores as there is no network requests during parallel work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant