do not support unknown special tokens #234

yxchng · 2024-12-08T08:21:37Z

How to resolve this error when finetuning?

ValueError: For now, we do not support unknown special tokens
In the future, if there is a need for this, we can add special tokens to the tokenizer
starting from rank 100261 - 100263 and then 100266 - 100275.
And finally, we can re-construct the enc object back

The text was updated successfully, but these errors were encountered:

leestott · 2024-12-11T20:33:01Z

This error typically occurs when the tokenizer encounters special tokens that it doesn't recognize. Here are some steps to resolve this issue:

Identify the Unknown Tokens: Check your dataset to identify any special tokens that are not recognized by the tokenizer.

Add Special Tokens to the Tokenizer:

You can add the special tokens to the tokenizer manually. Here's an example in Python using the Hugging Face transformers library:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('your-model-name')

special_tokens_dict = {'additional_special_tokens': ['<special1>', '<special2>', '<special3>']}
num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)

print(f"Added {num_added_toks} special tokens.")

Reconstruct the Encoding Object: After adding the special tokens, you may need to re-encode your dataset to ensure the new tokens are properly integrated.

Update the Model: If you're using a pre-trained model, make sure to update it to recognize the new special tokens:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('your-model-name')
model.resize_token_embeddings(len(tokenizer))

Re-run the Fine-tuning Process: With the tokenizer and model updated, you can re-run your fine-tuning process.

I would also suggest you look at the Phi-3 Fine Tuning with Microsoft Olive lab in the resources provided https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/olive-lab/readme.md

leestott closed this as completed Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not support unknown special tokens #234

do not support unknown special tokens #234

yxchng commented Dec 8, 2024

leestott commented Dec 11, 2024

do not support unknown special tokens #234

do not support unknown special tokens #234

Comments

yxchng commented Dec 8, 2024

leestott commented Dec 11, 2024