You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree verification is hard to made efficient? Thanks!
The text was updated successfully, but these errors were encountered:
Currently we have not added support for vLLM and are working to build a tensor parallelism system. With batch size > 1, we need to solve some additional problems, such as the #accepted tokens can be different for each request in the same batch. And the communication time is not considered in current implementation. After we build the tensor parallelism system, we will make it compatible with vLLM or other inference engines. Thank you!
Hi,
I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree verification is hard to made efficient? Thanks!
The text was updated successfully, but these errors were encountered: