-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare optimized models vs. transformers models #194
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@lhoestq jfyi, about the latency/throughput measurement, here's what I got: https://github.com/fxmarty/optimum/blob/a111cfee49afc9bed68e18865442f2454d2556c3/optimum/runs_base.py#L172-L242 . Borrowed from https://github.com/huggingface/tune |
bf9bd2e
to
2a120e4
Compare
docs/source/benchmark.mdx
Outdated
|
||
## RunConfig | ||
|
||
[[autodoc]] optimum.utils.runs.RunConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To generate these docs, you'll need to:
- add
pydantic
to the required deps - update the
__init__.py
file underutils
For the second point, this works (if you also exclude optimum.runs_base.Run
):
from .runs import RunConfig, Calibration, DatasetArgs, TaskArgs
from .preprocessing.base import DatasetProcessing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the first point was enough, and the doc for optimum.runs_base.Run
is well generated as well.
Just a doubt on whether adding an additional dependency is good, I think keeping install_requires
to the bare minimum is best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, we try to keep external dependencies to a minimum, so ideally we would drop pydantic
as a requirement if possible.
If not, you could add an new extras dep for e.g. benchmarks
that users can install with pip install optimum['benchmarks']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, for now I did your later suggestion as a matter of time, in a next PR I will replace pydantic by dataclasses altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback welcome, notably for the design, code quality, etc.
This PR aims at introducing an unified way to benchmark transformers vs. optimized models, backend-independent (in the sense of, any backend can be plugged for inference and evaluation), code-free (in the sense of, the user does not need to code to start runs and evaluate them).
The two main contributions is to introduce helper classes, methods for data preprocessing, inference, evaluation. In several files:
*
optimum/runs_base.py
: general methods, this should be backend-agnostic.*
optimum/utils/preprocessing/
: handle loading and preprocessing datasets, running inference with pipelines, running evaluation. This should be backend-agnostic.*
optimum/onnxruntime/runs/
: OnnxRuntime specific methodsFor now, dataset preprocessing and evaluation are task-specific, the supported tasks are:
text-classification
token-classification
question-answering
As for evaluation of transformers models, I believe there is some duplicate work with what exists in the AutoTrain backend and what is being done in https://github.com/huggingface/evaluate. However, my understanding being that it is not a priority to support Optimum-based inference within AutoTrain, it makes sense to me to have a common implementation to evaluate transformers/optimized models for them to be comparable. I hope we can make it such that we minimize duplicate efforts.
I used pipelines for inference for the general metrics, and
ORTModel.forward()
to measure latency/throughput.Tasks before (or after) merge
Trainer.evaluate()
instead of an explicit loop for evaluation --> I think it doesn't, there is a lot of abstraction in pipelines already, we should make use of it.train-eval-index
metadata from datasets to auto-infer data, label columns (see e.g. Autoeval config datasets#4234) (left to next PR)optimum/optimum/onnxruntime/modeling_ort.py
Lines 323 to 327 in cf91bd7
This, with some additional work, should close #128 .