Paraffin, derived from the Latin phrase parum affinis
meaning
little related
, is a Python package designed to run DVC
stages in parallel. While DVC does not currently support this directly, Paraffin
provides an effective workaround. For more details, refer to the DVC
documentation on
parallel stage execution.
Warning
paraffin
is still very experimental.
Do not use it for production workflows.
Install Paraffin via pip:
pip install paraffin
You can submit your current DVC workflow to a database file paraffin.db
for later execution.
Tip
The paraffin submit command supports globing patterns.
paraffin submit C_AddNodeNumbers "A*"
A submitted job will be executed by paraffin workers.
To start a worker you can run paraffin worker
.
The worker will pick up all the jobs in the workeres queue and close once finished.
paraffin worker
Paraffin ships with a web application for visualizing the progress. You can start it using
paraffin ui
To fine-tune execution, you can assign stages to specific Celery queues, allowing you to manage execution across different environments or hardware setups.
Define queues in a paraffin.yaml
file:
queue:
"B_X*": BQueue
"A_X_AddNodeNumbers": AQueue
Then, start a worker with specified queues, such as celery (default) and AQueue:
paraffin worker -q AQueue,default
All stages
not assigned to a queue in paraffin.yaml
will default to the default
queue.
Tip
If you are building Python-based workflows with DVC, consider trying our other project ZnTrack for a more Pythonic way to define workflows.