Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare optimized models vs. transformers models #194

Merged
merged 31 commits into from
May 25, 2022

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented May 17, 2022

Feedback welcome, notably for the design, code quality, etc.

This PR aims at introducing an unified way to benchmark transformers vs. optimized models, backend-independent (in the sense of, any backend can be plugged for inference and evaluation), code-free (in the sense of, the user does not need to code to start runs and evaluate them).

The two main contributions is to introduce helper classes, methods for data preprocessing, inference, evaluation. In several files:
* optimum/runs_base.py: general methods, this should be backend-agnostic.
* optimum/utils/preprocessing/: handle loading and preprocessing datasets, running inference with pipelines, running evaluation. This should be backend-agnostic.
* optimum/onnxruntime/runs/: OnnxRuntime specific methods

For now, dataset preprocessing and evaluation are task-specific, the supported tasks are:

  • text-classification
  • token-classification
  • question-answering

As for evaluation of transformers models, I believe there is some duplicate work with what exists in the AutoTrain backend and what is being done in https://github.com/huggingface/evaluate. However, my understanding being that it is not a priority to support Optimum-based inference within AutoTrain, it makes sense to me to have a common implementation to evaluate transformers/optimized models for them to be comparable. I hope we can make it such that we minimize duplicate efforts.

I used pipelines for inference for the general metrics, and ORTModel.forward() to measure latency/throughput.

Tasks before (or after) merge

  • Documentation (Added in an "API Reference" section)
  • Test on several datasets
  • See if it would make sense to use Trainer.evaluate() instead of an explicit loop for evaluation --> I think it doesn't, there is a lot of abstraction in pipelines already, we should make use of it.
  • Make use of train-eval-index metadata from datasets to auto-infer data, label columns (see e.g. Autoeval config datasets#4234) (left to next PR)
  • Support multi-column data (2 would be sufficient I guess, see Bert that receives text triplet as an input transformers#8573)
  • Document node exclusion (left to next PR)
  • Support node exclusion for dynamic quantization for OnnxRuntime (implemented in Allow onnxruntime quantization preprocessor for dynamic quantization #196)
  • Avoid tracking PyTorch to Numpy conversion for time measurements (left to next PR), in
    # converts pytorch inputs into numpy inputs for onnx
    onnx_inputs = {
    "input_ids": input_ids.cpu().detach().numpy(),
    "attention_mask": attention_mask.cpu().detach().numpy(),
    }
  • Still some work to distinguish backend-agnostic code vs. backend-specific code
  • Clean all remainings # TODO
  • Unit tests / github workflows (left to next PR)

This, with some additional work, should close #128 .

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@fxmarty
Copy link
Contributor Author

fxmarty commented May 19, 2022

@lhoestq jfyi, about the latency/throughput measurement, here's what I got: https://github.com/fxmarty/optimum/blob/a111cfee49afc9bed68e18865442f2454d2556c3/optimum/runs_base.py#L172-L242 . Borrowed from https://github.com/huggingface/tune

@fxmarty fxmarty force-pushed the runs-only branch 2 times, most recently from bf9bd2e to 2a120e4 Compare May 20, 2022 06:52

## RunConfig

[[autodoc]] optimum.utils.runs.RunConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To generate these docs, you'll need to:

  • add pydantic to the required deps
  • update the __init__.py file under utils

For the second point, this works (if you also exclude optimum.runs_base.Run):

from .runs import RunConfig, Calibration, DatasetArgs, TaskArgs
from .preprocessing.base import DatasetProcessing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the first point was enough, and the doc for optimum.runs_base.Run is well generated as well.

Just a doubt on whether adding an additional dependency is good, I think keeping install_requires to the bare minimum is best?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we try to keep external dependencies to a minimum, so ideally we would drop pydantic as a requirement if possible.

If not, you could add an new extras dep for e.g. benchmarks that users can install with pip install optimum['benchmarks']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, for now I did your later suggestion as a matter of time, in a next PR I will replace pydantic by dataclasses altogether.

docs/source/benchmark.mdx Outdated Show resolved Hide resolved
@mfuntowicz mfuntowicz self-requested a review May 25, 2022 13:08
Copy link
Member

@mfuntowicz mfuntowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lewtun @fxmarty for tracking the issue with the doc, we'll do the necessary things.

@mfuntowicz mfuntowicz merged commit 1b98940 into huggingface:main May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create benchmarking suite for optimised models
4 participants