-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BFCL] Prompt Caching for Claude Models (#751)
This PR request seeks to merge my changes of adding prompt caching abilities when running inference on Claude models. The benefit will be reduced cost significantly for inference on BFCL's multi-turn datasets when using the following models (in both Function Calling and Prompting modes): - Claude 3.5 Sonnet - Claude 3 Haiku - Claude 3 Opus Summary of changes made: - Cached user messages - Cached system prompt (for Prompting mode) - Cached tools (for Function-Calling mode) Please note: - This implementation rightfully avoids caching in single-turn cases as there aren't any future turns that could avail cache reading benefits. - According to the [Anthropic guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-storage-and-sharing), using prompting caching **will not** affect the model accuracy. > Prompt caching has no effect on output token generation. The response you receive will be identical to what you would get if prompt caching was not used. --------- Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
- Loading branch information
1 parent
19490f1
commit 5a42197
Showing
2 changed files
with
79 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters