-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPE dropout not working as expected #201
Comments
This indeed appears to be a bug.
Since I just started looking at this codebase, I'm not clear what the best way of fixing this might be. But if someone is willing to point me into the right direction, I'd be happy to offer my help and implement a fix. |
Yes indeed, the dropout parameter does not get forwarded to the new BPE being created during training. This is a bit tricky though, and it may require a lot of changes to be done properly. |
Any updates? Or maybe someone has any workarounds for this (excluding the topic starter solution)? |
Yes, as a workaround you can still reload the model like this: files = tokenizer.model.save("./", "workaround")
tokenizer.model = BPE.from_files(*files, dropout=0.1, unk_token="[UNK]") |
Hi @n1t0! I'm using RobertaTokenizerFast but this trick doesn't seem to work. Do you have any clue about it?
Thanks! |
In your second example, I think you should be reassigning on |
Ahh my bad. Yes, it works! Thanks! |
I'm using
BytelvelBPETokenizer
together withfastai.text
to make a text classifier in Python, and experimenting with BPE dropout.When I create an empty model and train on my own text corpus, BPE dropout doesn't seem to work.
output:
However, when I create a
ByteLevelBPETokenizer
from vocab and merges files, BPE dropout does work as I expect (probabilistic merging). Did I get something wrong with the first method, or is this a bug?output:
I came up with this workaround: train on the source text, then output the merges and vocab files and use them to make a new tokeniser
output:
The text was updated successfully, but these errors were encountered: