Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign specific jobs to dedicated workers #507

Closed
Nashtare opened this issue Aug 16, 2024 · 3 comments
Closed

Assign specific jobs to dedicated workers #507

Nashtare opened this issue Aug 16, 2024 · 3 comments
Assignees
Labels
crate: zero_bin Anything related to the zero-bin subcrates. enhancement New feature or request performance Performance improvement related changes

Comments

@Nashtare
Copy link
Collaborator

We are currently handling all proof jobs regardless of their underlying type by the same pool of workers.
However in practice, Txn / Segment proofs are much heavier & slower than all other kind of aggregation proofs.

We should consider having some job assignment mechanism, probably relying on paladin's workers' routing keys, to assign a particular job queue to some specific pool of workers. This would allow us to select dedicated hardware for the different proving jobs we have when proving blocks, typically selecting much cheaper instances with fewer memory for higher levels of aggregation.

@Nashtare Nashtare added enhancement New feature or request performance Performance improvement related changes labels Aug 16, 2024
@Nashtare Nashtare added this to the Performance Tuning milestone Aug 16, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Zero EVM Aug 16, 2024
@BGluth
Copy link
Contributor

BGluth commented Aug 19, 2024

Yeah I think this is actually pretty important.

Are we able to reasonably estimate cpu/memory needs for each txn/segment proof at this point? Idk if we want to go with some simple discrete ranking of machines (eg. light & heavy instances) or if we want to query the CPU & memory specs of each worker on startup and do something more dynamic.

@Nashtare
Copy link
Collaborator Author

We could do some benchmarking around the aggregation layers but these should be fairly light (we don't need anything else than the base circuits loaded from the ProverState), and the proving itself shouldn't take more than 4/5GB of RAM I'd assume. This would allow for a big drop in Memory / CPU ratio, while for segment proofs, t2d-60 (what we currently use) has a ratio of about 4 (240GB RAM / 60 vCPUs).

@Nashtare Nashtare added the crate: zero_bin Anything related to the zero-bin subcrates. label Aug 20, 2024
@muursh muursh moved this from Backlog to Todo in Zero EVM Aug 27, 2024
@muursh muursh moved this from Todo to In Progress in Zero EVM Aug 28, 2024
@temaniarpit27
Copy link
Contributor

temaniarpit27 commented Sep 3, 2024

@Nashtare @BGluth
As discussed with @muursh , we have added a couple of features in this task. Added 2 modes - default (mode which works the way it is working right now where we can run any job on any machine), affinity (in this mode we need to provide different routing keys on different servers to enable workers).

Split mode details:
Leader args:

--worker-run-mode affinity

this will put segment proof jobs and block proof jobs in different queues

Worker args:

--task-bus-routing-key heavy-proof/light-proof

This will start worker which will accept only messages from the corresponding queues

Currently in this version we dont have the functionality of running multiple queues on 1 machine
Since this will be a multi node arch, we will also need to provide the correct amqp uri on leader and workers to make sure they connect to they same rabbitmq server.

Also, we will need either a cluster of rabbitmq or some kind of persistence for queues and messages

@temaniarpit27 temaniarpit27 moved this from In Progress to Ready to Review in Zero EVM Sep 3, 2024
@temaniarpit27 temaniarpit27 moved this from Ready to Review to Ready To Merge in Zero EVM Oct 11, 2024
@temaniarpit27 temaniarpit27 moved this from Ready To Merge to Done in Zero EVM Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate: zero_bin Anything related to the zero-bin subcrates. enhancement New feature or request performance Performance improvement related changes
Projects
Status: Done
Development

No branches or pull requests

3 participants