Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cost calculation via token counting #580

Open
pirminj opened this issue Jan 23, 2025 · 10 comments
Open

Add cost calculation via token counting #580

pirminj opened this issue Jan 23, 2025 · 10 comments
Labels
enhancement New feature or request

Comments

@pirminj
Copy link
Contributor

pirminj commented Jan 23, 2025

To get a better feeling how expensive long chats can be, a cost estimation calculation may be helpful.
Anthropic allows for token counting, see https://docs.anthropic.com/en/docs/build-with-claude/token-counting. This can be multiplied with the cost of the current model.

@pirminj pirminj added the enhancement New feature or request label Jan 23, 2025
@karthink
Copy link
Owner

See some prior discussion about this in #315.

Could you suggest

  1. When this token counting should happen, keeping in mind that gptel can be used from anywhere?
  2. How this count should be indicated to the user?

@benthamite
Copy link
Contributor

benthamite commented Jan 25, 2025

In my rough implementation,

  1. it happens in any buffer in which gptel-mode is enabled
  2. it is displayed as an extra headerline element.
Image

As I explain in the gptel-extras-get-cost docstring, the cost is an approximation and it has some limitations. Not sure if other models besides Anthropic allow for token counting, but perhaps one could continue to approximate it when this information is not provided.

@vdm
Copy link

vdm commented Jan 25, 2025

RepoPrompt shows current context token total, with percentage per file and directory.

Image

In the creator's screencasts, this is used as a quick indication of outlier large files (e.g. package.lock and tags files) that can be eliminated to conserve input token resources. Not to provide a price.

@bharadswami
Copy link

Many API providers return the token count or cost as part of the response. E.g. Both OpenAI and Anthropic return the usage parameter, OpenRouter returns a generation id which can be queried to their generations endpoint to get token count and total_cost etc., so we don't need to send the whole prompt again to count tokens separately.

@axelknock
Copy link
Contributor

Perhaps if the provider (for example anthropic) has a token counting end point, gptel-inspect-query could have an option to gptel-query-cost that queries the endpoint and just displays the result in the modeline. This endpoint would likely be added as another option for gptel-make-*.

But for the sake of simplicity simply surfacing the "Last query cost" and "Running total" in gptel-menu for endpoints that return 'usage' as @bharadswami pointed out would be easier.

The problem with tracking what you're about to spend is you don't know what you're going to send until you send it.

Maybe an additional option that just queries the endpoint when the query goes beyond some character threshold and asks for confirmation you want to spend $X on the query?

@karthink
Copy link
Owner

karthink commented Feb 8, 2025

Perhaps if the provider (for example anthropic) has a token counting end point, gptel-inspect-query could have an option to gptel-query-cost that queries the endpoint and just displays the result in the modeline. This endpoint would likely be added as another option for gptel-make-*.

This is a good idea. If someone finds documentation on a token counting endpoint (for any supported API), please include them in this thread.

For Anthropic this is https://api.anthropic.com/v1/messages/count_tokens

@axelknock
Copy link
Contributor

It looks like you can count tokens for Gemini with this: https://generativelanguage.googleapis.com/v1beta/{model=models/*}:countTokens.

Looking for a tokencounting endpoint for OpenAI only turned up tiktoken in my search. If I wanted to, I would use js-tiktoken--a port of tiktoken to run on Node--and run it in a Cloudflare worker. I'm not sure if this would work but I haven't found a better option.

Ditto with Deepseek, they expect you to count tokens locally.

xAI has an endpoint: /v1/tokenize-text.

@bharadswami
Copy link

It seems tricky to estimate the cost of an LLM query before making the API request, since we can't predict the number of output tokens until the LLM processes the request. Given that output tokens cost more than input tokens for almost all models, any minor errors in calculating the number of tokens in the input query might not matter for the approximate cost displayed to the user asking for confirmation.

Has anyone done a comparison of the token count difference between the tokenizers of each LLM? If the variation is not that high, for a v0.1 implementation we could simply use the Anthropic count_tokens endpoint for all models to get an estimate. Turn cost calculation off by default and provide appropriate disclaimers.

P.S. Personally I'm using a crude approach by wrapping gptel-send in a custom function that checks the word count of the prompt with regex and asks for confirmation if it exceeds 1000 words

;; within let*
(prompt (gptel--create-prompt start-marker))
(word-count (length (split-string prompt "\\W+")))
;; ...
(if (>= word-count 1000)
  ;; ask for confirmation

This check catches most cases where I want confirmation i.e. accidentally sending massive inputs when I call gptel-send from a writing/code buffer or not realizing how long the conversation chain has gotten.

@karthink
Copy link
Owner

karthink commented Feb 9, 2025

@axelknock thanks for digging through the documentation! I've made a note.

If the variation is not that high, for a v0.1 implementation we could simply use the Anthropic count_tokens endpoint for all models to get an estimate.

That requires the user to have an Anthropic API key, I think.

@bharadswami
Copy link

That requires the user to have an Anthropic API key, I think.

It does require an Anthropic API key, but token counting is a free API request. There could also be the option of running tiktoken on the client device in case the user wants to perform all operations locally, although I'm not sure if there's a way to get token usage on file uploads and tool use through tiktoken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants