Replies: 1 comment
-
At the same time, I would like to know if it’s possible for sglang to have a simple server management UI that allows us to view real-time load queue data for prefill and decode, or alternatively, an interface would also work. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I sent asynchronous requests to the OpenAI server of the sglang host. I set the concurrency to 1024, using following decorator.
client output
but I observed the following output from server:
Why doesn't the number of running requests reach 1024, and why aren't the additional requests in the request queue?
Beta Was this translation helpful? Give feedback.
All reactions