Skip to content

Commit

Permalink
Merge pull request #9 from eltociear/patch-1
Browse files Browse the repository at this point in the history
docs: update README.md
  • Loading branch information
alogfans authored Dec 2, 2024
2 parents 564da85 + 57d6847 commit b073838
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Thanks to the high performance of Transfer Engine, P2P Stores can also distribut
![p2p-store.gif](image/p2p-store.gif)

### vLLM Integration ([Guide](doc/en/vllm-integration.md))
To optmize LLM inference, the vLLM's community is working at supporting [disaggregated prefilling (PR 8498)](https://github.com/vllm-project/vllm/pull/8498). This feature allows separating the **prefill** phase from the **decode** phase in different processes. The vLLM uses `nccl` and `gloo` as the transport layer by default, but currently it cannot efficiently decouple both phases in different machines.
To optimize LLM inference, the vLLM's community is working at supporting [disaggregated prefilling (PR 8498)](https://github.com/vllm-project/vllm/pull/8498). This feature allows separating the **prefill** phase from the **decode** phase in different processes. The vLLM uses `nccl` and `gloo` as the transport layer by default, but currently it cannot efficiently decouple both phases in different machines.

We have implemented vLLM integration, which uses Transfer Engine as the network layer instead of `nccl` and `gloo`, to support **inter-node KVCache transfer**. Transfer Engine provides simpler interface and more efficient use of RDMA devices. In the future, we plan to build Mooncake Store on the basis of Transfer Engine, which supports pooled prefill/decode disaggregation.

Expand Down

0 comments on commit b073838

Please sign in to comment.