This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

feature: implemented parallel inference for llama-rs, implemented naive sequential async inference for llama-cpp and rwkv-cpp #52

Merged

hlhr202 merged 5 commits into main from feature/async

May 9, 2023

Member

hlhr202 commented May 9, 2023

llama-rs can fully utilize the parallel inference ability that every inference session will get same priority and output in concurrent
llama-cpp and rwkv-cpp only use Arc<Mutex> so their async inference will be in sequential way.

hlhr202 added 5 commits

May 9, 2023 13:31


          feat: support parallel inference for llama-rs

9fff635


          feat: update ts interface for llama-rs

4347d23


          feat: update ts interface for llama-rs

99b117e


          feat: update llama-cpp as async

5e953c3


          feat: update rwkv-cpp as async

c7fba6e

hlhr202 merged commit e82222d into main

hlhr202 deleted the feature/async branch

May 13, 2023 07:06

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet