You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are currently no plans to retrain on an English dataset.
However, the only difference between Chinese and English is the dataset. This issue
can be addressed by replacing it with an appropriate English dataset for training (this will require some exploration to find a high-quality dataset).
The tokenizer does not need to be replaced and can be reused as it currently has sufficient capabilities for both Chinese and English.
If I complete this, I will update it under this issue.
It may be done this month, or possibly within 2024, to achieve English tasks.
Thank you for your attention. Wishing you all the best.
Dear minimind's contributors,
I love this repo! Would you have an English training dataset and tokenizer in the future?
It would be very nice if the repo were more international!
The text was updated successfully, but these errors were encountered: