-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference on TPU Pod (v4-64) #183
Comments
Hi, and thanks for using easydel |
Appreciate the quick response. :) I was actually able to run the server but when I tried using the chat completions endpoint, it threw this error: |
You have changed fsdp size to 32 so you need at least 32 input batches Second one is pass in 32 batches to vinference |
Didn’t work with any configuration combination in the end. :( Any idea what the issue is for multi node tpu and how to fix it? |
Have you tested with sample inputs? To get started:
eopod kill-tpu --force
eopod run "cd EasyDeL && git pull && USE_AOT=false python tests/vinference_runtime_test.py"
I’ve tested this code on TPUv4-32. While I primarily use GPUs and don’t have full sponsorship from TRC, I can confirm that this code works reliably on both 16×A100 GPUs and TPUv4-32. Also, I might be slow to respond to GitHub issues. If it’s urgent, feel free to DM me on Discord. Let me know if you encounter any issues |
@erfanzar I've been trying to run inference on TPU v4-32 as well, but am running into issues. When I run the command you gave, I get
Any tips? Also, what is your discord? |
Hi @nathom, Thank you for using EasyDel! From the errors you're encountering, it seems like you might be using a JAX version lower than If updating JAX doesn't fix the issue, you can disable EasyDel's auto flags by setting the environment variable EASYDEL_AUTO=0 ...script Let me know if you need further assistance! |
@nathom sorry i forgot about discord search for |
That fixes the issue, thanks! |
Hello @creatorrr and @nathom, Thank you for using EasyDeL! I’m happy to share that the issue with sharding model statics across multiple nodes on TPUs has been resolved and is now fully functional. Tested on:
Let us know if you encounter any further issues or have feedback!
|
Describe the bug
I am trying to run inference on a TPU Pod v4-64 (8 workers, 32 devices). I tried running the following code from one of the examples:
Am a complete noob at both TPUs and easydel so maybe I am missing something really obvious? Plz halp!
Getting this error:
Traceback:
The text was updated successfully, but these errors were encountered: