Skip to content

Commit

Permalink
Trainer - deprecate tokenizer for processing_class (#32385)
Browse files Browse the repository at this point in the history
* Trainer - deprecate tokenizer for processing_class

* Extend chage across Seq2Seq trainer and docs

* Add tests

* Update to FutureWarning and add deprecation version
  • Loading branch information
amyeroberts authored Oct 2, 2024
1 parent e7c8af7 commit b7474f2
Show file tree
Hide file tree
Showing 99 changed files with 569 additions and 442 deletions.
6 changes: 3 additions & 3 deletions docs/source/en/hpo_train.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.

# Hyperparameter Search using Trainer API

🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] provides API for hyperparameter search. This doc shows how to enable it in example.
🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] provides API for hyperparameter search. This doc shows how to enable it in example.

## Hyperparameter Search backend

Expand All @@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.

you should install them before using them as the hyperparameter search backend
```bash
pip install optuna/sigopt/wandb/ray[tune]
pip install optuna/sigopt/wandb/ray[tune]
```

## How to enable Hyperparameter search in example
Expand Down Expand Up @@ -112,7 +112,7 @@ Create a [`Trainer`] with your `model_init` function, training arguments, traini
... train_dataset=small_train_dataset,
... eval_dataset=small_eval_dataset,
... compute_metrics=compute_metrics,
... tokenizer=tokenizer,
... processing_class=tokenizer,
... model_init=model_init,
... data_collator=data_collator,
... )
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/model_doc/mamba.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ The original code can be found [here](https://github.com/state-spaces/mamba).

# Usage

### A simple generation example:
```python
### A simple generation example:
```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

Expand All @@ -55,7 +55,7 @@ print(tokenizer.batch_decode(out))
### Peft finetuning
The slow version is not very stable for training, and the fast one needs `float32`!

```python
```python
from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
Expand All @@ -80,7 +80,7 @@ lora_config = LoraConfig(
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
processing_class=tokenizer,
args=training_args,
peft_config=lora_config,
train_dataset=dataset,
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/quicktour.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Load an audio dataset (see the 🤗 Datasets [Quick Start](https://huggingface.c
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
```

You need to make sure the sampling rate of the dataset matches the sampling
You need to make sure the sampling rate of the dataset matches the sampling
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:

```py
Expand Down Expand Up @@ -174,7 +174,7 @@ If you can't find a model for your use-case, you'll need to finetune a pretraine

<Youtube id="AhChOFRegn4"/>

Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class.
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class.

Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].

Expand Down Expand Up @@ -485,7 +485,7 @@ Now gather all these classes in [`Trainer`]:
... args=training_args,
... train_dataset=dataset["train"],
... eval_dataset=dataset["test"],
... tokenizer=tokenizer,
... processing_class=tokenizer,
... data_collator=data_collator,
... ) # doctest: +SKIP
```
Expand All @@ -502,7 +502,7 @@ For tasks - like translation or summarization - that use a sequence-to-sequence

</Tip>

You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed.
You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed.

The other way to customize the training loop is by using [Callbacks](./main_classes/callback). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/tasks/asr.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=encoded_minds["train"],
... eval_dataset=encoded_minds["test"],
... tokenizer=processor,
... processing_class=processor,
... data_collator=data_collator,
... compute_metrics=compute_metrics,
... )
Expand Down Expand Up @@ -368,4 +368,4 @@ Get the predicted `input_ids` with the highest probability, and use the processo
['I WOUL LIKE O SET UP JOINT ACOUNT WTH Y PARTNER']
```
</pt>
</frameworkcontent>
</frameworkcontent>
8 changes: 4 additions & 4 deletions docs/source/en/tasks/audio_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ Take a look at an example now:

There are two fields:

- `audio`: a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file.
- `intent_class`: represents the class id of the speaker's intent.
- `audio`: a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file.
- `intent_class`: represents the class id of the speaker's intent.

To make it easier for the model to get the label name from the label id, create a dictionary that maps the label name to an integer and vice versa:

Expand Down Expand Up @@ -235,7 +235,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=encoded_minds["train"],
... eval_dataset=encoded_minds["test"],
... tokenizer=feature_extractor,
... processing_class=feature_extractor,
... compute_metrics=compute_metrics,
... )

Expand Down Expand Up @@ -321,4 +321,4 @@ Get the class with the highest probability, and use the model's `id2label` mappi
'cash_deposit'
```
</pt>
</frameworkcontent>
</frameworkcontent>
4 changes: 2 additions & 2 deletions docs/source/en/tasks/document_question_answering.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ Finally, bring everything together, and call [`~Trainer.train`]:
... data_collator=data_collator,
... train_dataset=encoded_train_dataset,
... eval_dataset=encoded_test_dataset,
... tokenizer=processor,
... processing_class=processor,
... )

>>> trainer.train()
Expand Down Expand Up @@ -489,4 +489,4 @@ which token is at the end of the answer. Both have shape (batch_size, sequence_l

>>> processor.tokenizer.decode(encoding.input_ids.squeeze()[predicted_start_idx : predicted_end_idx + 1])
'lee a. waller'
```
```
2 changes: 1 addition & 1 deletion docs/source/en/tasks/image_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ At this point, only three steps remain:
... data_collator=data_collator,
... train_dataset=food["train"],
... eval_dataset=food["test"],
... tokenizer=image_processor,
... processing_class=image_processor,
... compute_metrics=compute_metrics,
... )

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,25 @@ rendered properly in your Markdown viewer.

Knowledge distillation is a technique used to transfer knowledge from a larger, more complex model (teacher) to a smaller, simpler model (student). To distill knowledge from one model to another, we take a pre-trained teacher model trained on a certain task (image classification for this case) and randomly initialize a student model to be trained on image classification. Next, we train the student model to minimize the difference between it's outputs and the teacher's outputs, thus making it mimic the behavior. It was first introduced in [Distilling the Knowledge in a Neural Network by Hinton et al](https://arxiv.org/abs/1503.02531). In this guide, we will do task-specific knowledge distillation. We will use the [beans dataset](https://huggingface.co/datasets/beans) for this.

This guide demonstrates how you can distill a [fine-tuned ViT model](https://huggingface.co/merve/vit-mobilenet-beans-224) (teacher model) to a [MobileNet](https://huggingface.co/google/mobilenet_v2_1.4_224) (student model) using the [Trainer API](https://huggingface.co/docs/transformers/en/main_classes/trainer#trainer) of 🤗 Transformers.
This guide demonstrates how you can distill a [fine-tuned ViT model](https://huggingface.co/merve/vit-mobilenet-beans-224) (teacher model) to a [MobileNet](https://huggingface.co/google/mobilenet_v2_1.4_224) (student model) using the [Trainer API](https://huggingface.co/docs/transformers/en/main_classes/trainer#trainer) of 🤗 Transformers.

Let's install the libraries needed for distillation and evaluating the process.
Let's install the libraries needed for distillation and evaluating the process.

```bash
pip install transformers datasets accelerate tensorboard evaluate --upgrade
```

In this example, we are using the `merve/beans-vit-224` model as teacher model. It's an image classification model, based on `google/vit-base-patch16-224-in21k` fine-tuned on beans dataset. We will distill this model to a randomly initialized MobileNetV2.

We will now load the dataset.
We will now load the dataset.

```python
from datasets import load_dataset

dataset = load_dataset("beans")
```

We can use an image processor from either of the models, as in this case they return the same output with same resolution. We will use the `map()` method of `dataset` to apply the preprocessing to every split of the dataset.
We can use an image processor from either of the models, as in this case they return the same output with same resolution. We will use the `map()` method of `dataset` to apply the preprocessing to every split of the dataset.

```python
from transformers import AutoImageProcessor
Expand Down Expand Up @@ -93,15 +93,15 @@ class ImageDistilTrainer(Trainer):
return (loss, student_output) if return_outputs else loss
```

We will now login to Hugging Face Hub so we can push our model to the Hugging Face Hub through the `Trainer`.
We will now login to Hugging Face Hub so we can push our model to the Hugging Face Hub through the `Trainer`.

```python
from huggingface_hub import notebook_login

notebook_login()
```

Let's set the `TrainingArguments`, the teacher model and the student model.
Let's set the `TrainingArguments`, the teacher model and the student model.

```python
from transformers import AutoModelForImageClassification, MobileNetV2Config, MobileNetV2ForImageClassification
Expand Down Expand Up @@ -164,7 +164,7 @@ trainer = ImageDistilTrainer(
train_dataset=processed_datasets["train"],
eval_dataset=processed_datasets["validation"],
data_collator=data_collator,
tokenizer=teacher_processor,
processing_class=teacher_processor,
compute_metrics=compute_metrics,
temperature=5,
lambda_param=0.5
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/tasks/multiple_choice.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=tokenized_swag["train"],
... eval_dataset=tokenized_swag["validation"],
... tokenizer=tokenizer,
... processing_class=tokenizer,
... data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
... compute_metrics=compute_metrics,
... )
Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/tasks/object_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,15 +340,15 @@ with `pixel_values`, a tensor with `pixel_mask`, and `labels`.
[ 0.0741, 0.0741, 0.0741, ..., 0.0741, 0.0741, 0.0741],
[ 0.0741, 0.0741, 0.0741, ..., 0.0741, 0.0741, 0.0741],
[ 0.0741, 0.0741, 0.0741, ..., 0.0741, 0.0741, 0.0741]],

[[ 1.6232, 1.6408, 1.6583, ..., 0.8704, 1.0105, 1.1331],
[ 1.6408, 1.6583, 1.6758, ..., 0.8529, 0.9930, 1.0980],
[ 1.6933, 1.6933, 1.7108, ..., 0.8179, 0.9580, 1.0630],
...,
[ 0.2052, 0.2052, 0.2052, ..., 0.2052, 0.2052, 0.2052],
[ 0.2052, 0.2052, 0.2052, ..., 0.2052, 0.2052, 0.2052],
[ 0.2052, 0.2052, 0.2052, ..., 0.2052, 0.2052, 0.2052]],

[[ 1.8905, 1.9080, 1.9428, ..., -0.1487, -0.0964, -0.0615],
[ 1.9254, 1.9428, 1.9603, ..., -0.1661, -0.1138, -0.0790],
[ 1.9777, 1.9777, 1.9951, ..., -0.2010, -0.1138, -0.0790],
Expand Down Expand Up @@ -569,7 +569,7 @@ Finally, bring everything together, and call [`~transformers.Trainer.train`]:
... args=training_args,
... train_dataset=cppe5["train"],
... eval_dataset=cppe5["validation"],
... tokenizer=image_processor,
... processing_class=image_processor,
... data_collator=collate_fn,
... compute_metrics=eval_compute_metrics_fn,
... )
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/tasks/question_answering.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=tokenized_squad["train"],
... eval_dataset=tokenized_squad["test"],
... tokenizer=tokenizer,
... processing_class=tokenizer,
... data_collator=data_collator,
... )

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/tasks/sequence_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=tokenized_imdb["train"],
... eval_dataset=tokenized_imdb["test"],
... tokenizer=tokenizer,
... processing_class=tokenizer,
... data_collator=data_collator,
... compute_metrics=compute_metrics,
... )
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/tasks/summarization.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ At this point, only three steps remain:
... args=training_args,
... train_dataset=tokenized_billsum["train"],
... eval_dataset=tokenized_billsum["test"],
... tokenizer=tokenizer,
... processing_class=tokenizer,
... data_collator=data_collator,
... compute_metrics=compute_metrics,
... )
Expand Down
Loading

0 comments on commit b7474f2

Please sign in to comment.