You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am confused why _run_workers_async function of DistributedGPUExecutorAsync is removed since v0.4.3?
New implementation starts a loop for every worker which will restrict worker from doing other things such as transfering kv cache in prefill/decode disaggregation. I use _run_workers_async to transfer kv cache before without any problems but it will execute only when the loops of workers are stopped currently.
I am sorry that I am not familiar with asyncio in python. I want to know what the benefits of the new implementation are? And how to allow the workers to transfer kv asynchronously during generation?
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
I am confused why _run_workers_async function of DistributedGPUExecutorAsync is removed since v0.4.3?
New implementation starts a loop for every worker which will restrict worker from doing other things such as transfering kv cache in prefill/decode disaggregation. I use _run_workers_async to transfer kv cache before without any problems but it will execute only when the loops of workers are stopped currently.
I am sorry that I am not familiar with asyncio in python. I want to know what the benefits of the new implementation are? And how to allow the workers to transfer kv asynchronously during generation?
The text was updated successfully, but these errors were encountered: