-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cost calculation via token counting #580
Comments
See some prior discussion about this in #315. Could you suggest
|
In my rough implementation,
![]() As I explain in the |
RepoPrompt shows current context token total, with percentage per file and directory. In the creator's screencasts, this is used as a quick indication of outlier large files (e.g. package.lock and tags files) that can be eliminated to conserve input token resources. Not to provide a price. |
Many API providers return the token count or cost as part of the response. E.g. Both OpenAI and Anthropic return the |
Perhaps if the provider (for example anthropic) has a token counting end point, But for the sake of simplicity simply surfacing the "Last query cost" and "Running total" in gptel-menu for endpoints that return 'usage' as @bharadswami pointed out would be easier. The problem with tracking what you're about to spend is you don't know what you're going to send until you send it. Maybe an additional option that just queries the endpoint when the query goes beyond some character threshold and asks for confirmation you want to spend $X on the query? |
This is a good idea. If someone finds documentation on a token counting endpoint (for any supported API), please include them in this thread. For Anthropic this is |
It looks like you can count tokens for Gemini with this: Looking for a tokencounting endpoint for OpenAI only turned up tiktoken in my search. If I wanted to, I would use js-tiktoken--a port of tiktoken to run on Node--and run it in a Cloudflare worker. I'm not sure if this would work but I haven't found a better option. Ditto with Deepseek, they expect you to count tokens locally. xAI has an endpoint: |
It seems tricky to estimate the cost of an LLM query before making the API request, since we can't predict the number of output tokens until the LLM processes the request. Given that output tokens cost more than input tokens for almost all models, any minor errors in calculating the number of tokens in the input query might not matter for the approximate cost displayed to the user asking for confirmation. Has anyone done a comparison of the token count difference between the tokenizers of each LLM? If the variation is not that high, for a v0.1 implementation we could simply use the Anthropic P.S. Personally I'm using a crude approach by wrapping ;; within let*
(prompt (gptel--create-prompt start-marker))
(word-count (length (split-string prompt "\\W+")))
;; ...
(if (>= word-count 1000)
;; ask for confirmation This check catches most cases where I want confirmation i.e. accidentally sending massive inputs when I call |
@axelknock thanks for digging through the documentation! I've made a note.
That requires the user to have an Anthropic API key, I think. |
It does require an Anthropic API key, but token counting is a free API request. There could also be the option of running tiktoken on the client device in case the user wants to perform all operations locally, although I'm not sure if there's a way to get token usage on file uploads and tool use through tiktoken. |
To get a better feeling how expensive long chats can be, a cost estimation calculation may be helpful.
Anthropic allows for token counting, see https://docs.anthropic.com/en/docs/build-with-claude/token-counting. This can be multiplied with the cost of the current model.
The text was updated successfully, but these errors were encountered: