Add cost calculation via token counting #580

pirminj · 2025-01-23T11:38:49Z

To get a better feeling how expensive long chats can be, a cost estimation calculation may be helpful.
Anthropic allows for token counting, see https://docs.anthropic.com/en/docs/build-with-claude/token-counting. This can be multiplied with the cost of the current model.

karthink · 2025-01-25T06:06:18Z

See some prior discussion about this in #315.

Could you suggest

When this token counting should happen, keeping in mind that gptel can be used from anywhere?
How this count should be indicated to the user?

benthamite · 2025-01-25T17:07:46Z

In my rough implementation,

it happens in any buffer in which gptel-mode is enabled
it is displayed as an extra headerline element.

As I explain in the gptel-extras-get-cost docstring, the cost is an approximation and it has some limitations. Not sure if other models besides Anthropic allow for token counting, but perhaps one could continue to approximate it when this information is not provided.

vdm · 2025-01-25T22:46:46Z

RepoPrompt shows current context token total, with percentage per file and directory.

In the creator's screencasts, this is used as a quick indication of outlier large files (e.g. package.lock and tags files) that can be eliminated to conserve input token resources. Not to provide a price.

bharadswami · 2025-02-07T22:13:08Z

Many API providers return the token count or cost as part of the response. E.g. Both OpenAI and Anthropic return the usage parameter, OpenRouter returns a generation id which can be queried to their generations endpoint to get token count and total_cost etc., so we don't need to send the whole prompt again to count tokens separately.

axelknock · 2025-02-08T17:02:15Z

Perhaps if the provider (for example anthropic) has a token counting end point, gptel-inspect-query could have an option to gptel-query-cost that queries the endpoint and just displays the result in the modeline. This endpoint would likely be added as another option for gptel-make-*.

But for the sake of simplicity simply surfacing the "Last query cost" and "Running total" in gptel-menu for endpoints that return 'usage' as @bharadswami pointed out would be easier.

The problem with tracking what you're about to spend is you don't know what you're going to send until you send it.

Maybe an additional option that just queries the endpoint when the query goes beyond some character threshold and asks for confirmation you want to spend $X on the query?

karthink · 2025-02-08T17:17:51Z

Perhaps if the provider (for example anthropic) has a token counting end point, gptel-inspect-query could have an option to gptel-query-cost that queries the endpoint and just displays the result in the modeline. This endpoint would likely be added as another option for gptel-make-*.

This is a good idea. If someone finds documentation on a token counting endpoint (for any supported API), please include them in this thread.

For Anthropic this is https://api.anthropic.com/v1/messages/count_tokens

axelknock · 2025-02-08T23:00:42Z

It looks like you can count tokens for Gemini with this: https://generativelanguage.googleapis.com/v1beta/{model=models/*}:countTokens.

Looking for a tokencounting endpoint for OpenAI only turned up tiktoken in my search. If I wanted to, I would use js-tiktoken--a port of tiktoken to run on Node--and run it in a Cloudflare worker. I'm not sure if this would work but I haven't found a better option.

Ditto with Deepseek, they expect you to count tokens locally.

xAI has an endpoint: /v1/tokenize-text.

bharadswami · 2025-02-09T21:52:05Z

It seems tricky to estimate the cost of an LLM query before making the API request, since we can't predict the number of output tokens until the LLM processes the request. Given that output tokens cost more than input tokens for almost all models, any minor errors in calculating the number of tokens in the input query might not matter for the approximate cost displayed to the user asking for confirmation.

Has anyone done a comparison of the token count difference between the tokenizers of each LLM? If the variation is not that high, for a v0.1 implementation we could simply use the Anthropic count_tokens endpoint for all models to get an estimate. Turn cost calculation off by default and provide appropriate disclaimers.

P.S. Personally I'm using a crude approach by wrapping gptel-send in a custom function that checks the word count of the prompt with regex and asks for confirmation if it exceeds 1000 words

;; within let*
(prompt (gptel--create-prompt start-marker))
(word-count (length (split-string prompt "\\W+")))
;; ...
(if (>= word-count 1000)
  ;; ask for confirmation

This check catches most cases where I want confirmation i.e. accidentally sending massive inputs when I call gptel-send from a writing/code buffer or not realizing how long the conversation chain has gotten.

karthink · 2025-02-09T22:50:14Z

@axelknock thanks for digging through the documentation! I've made a note.

If the variation is not that high, for a v0.1 implementation we could simply use the Anthropic count_tokens endpoint for all models to get an estimate.

That requires the user to have an Anthropic API key, I think.

bharadswami · 2025-02-10T23:19:13Z

That requires the user to have an Anthropic API key, I think.

It does require an Anthropic API key, but token counting is a free API request. There could also be the option of running tiktoken on the client device in case the user wants to perform all operations locally, although I'm not sure if there's a way to get token usage on file uploads and tool use through tiktoken.

pirminj added the enhancement New feature or request label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cost calculation via token counting #580

Add cost calculation via token counting #580

pirminj commented Jan 23, 2025

karthink commented Jan 25, 2025

benthamite commented Jan 25, 2025 •

edited

Loading

vdm commented Jan 25, 2025 •

edited

Loading

bharadswami commented Feb 7, 2025

axelknock commented Feb 8, 2025

karthink commented Feb 8, 2025

axelknock commented Feb 8, 2025

bharadswami commented Feb 9, 2025

karthink commented Feb 9, 2025

bharadswami commented Feb 10, 2025

Add cost calculation via token counting #580

Add cost calculation via token counting #580

Comments

pirminj commented Jan 23, 2025

karthink commented Jan 25, 2025

benthamite commented Jan 25, 2025 • edited Loading

vdm commented Jan 25, 2025 • edited Loading

bharadswami commented Feb 7, 2025

axelknock commented Feb 8, 2025

karthink commented Feb 8, 2025

axelknock commented Feb 8, 2025

bharadswami commented Feb 9, 2025

karthink commented Feb 9, 2025

bharadswami commented Feb 10, 2025

benthamite commented Jan 25, 2025 •

edited

Loading

vdm commented Jan 25, 2025 •

edited

Loading