-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to instantiate tokenizers from the Hub (from_pretrained) #780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add 1 test per binding + 1 for rust at least ?
It will depend on network+hub which is never great but I think it could be helpful.
Also we probably need to add it to the docs, no ?
fn default() -> Self { | ||
Self { | ||
revision: "main".into(), | ||
user_agent: HashMap::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "tokenizers", "rust" or something by default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, by default the user agent always contains tokenizers/{RUST_VERSION}
, this is for additional items
Agreed
I've already added it in every place I though about. Do you have something specific in mind? |
I missed it in |
Similarly to what exists in
transformers
, add the ability to instantiate tokenizers fromtokenizer.json
files uploaded to the Hugging Face Hub.Example: