Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

feature: implemented parallel inference for llama-rs, implemented naive sequential async inference for llama-cpp and rwkv-cpp #52

Merged
merged 5 commits into from
May 9, 2023

Conversation

hlhr202
Copy link
Member

@hlhr202 hlhr202 commented May 9, 2023

  1. llama-rs can fully utilize the parallel inference ability that every inference session will get same priority and output in concurrent
  2. llama-cpp and rwkv-cpp only use Arc<Mutex> so their async inference will be in sequential way.

@hlhr202 hlhr202 merged commit e82222d into main May 9, 2023
@hlhr202 hlhr202 deleted the feature/async branch May 13, 2023 07:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant