-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Nemo-Serve's speed #14
Comments
@YaphetKG pointed out that inferencing blocks the Uvicorn worker, so if there are too many requests, no new HTTP requests will be served until previous processing is complete. |
Did some testing for different GPU types with the following test text
And POD memory was 10Gi for the following instances:
|
My mds-import tool can use Nemo-Serve and SAPBERT to generate 7,569 annotations from 1,141 fields in about 20 mins, so around 57 fields a minute, which is pretty close to your estimate of around a second per text. This isn't too bad at all for HEAL purposes! |
At the moment, Nemo-Serve CLI takes around 0.8-1sec/text. Ideally, we would like to speed it up both so we can annotate all of split_11 (around 430k lines), but also so it's generally easier to integrate Nemo-Serve into workflows.
The text was updated successfully, but these errors were encountered: