You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meta's LLaMA model has been trained on a massive amount of data - 1.0T/1.4T tokens on 2048 A100s (80GB) over a period of 5 months. Continuing the pre-training of the LLaMA model on a French corpus is definitely a promising approach to improve its performance on the French language. However, this option is still quite expensive and may require significant computational resources. I'm currently pre-training it on a small French dataset to see if it improves a lot. Stay tuned!
Creating a French Llama version by translating RedPajama dataset
The text was updated successfully, but these errors were encountered: