Automatic receipt scanning #23

rgov · 2024-01-06T22:48:23Z

To make entering purchases easier, a model (like donut-base-finetuned-cord-v2, demo) could be wired up to provide automatic receipt scanning.

Depending on how big of a model it is, you might be able to run inference entirely client side with ONNX Runtime Web or similar.

adiso06 · 2024-01-18T16:35:26Z

+1, would like some sort of OCR based receipt scanning/splitting feature similiar to splitwise. Great app!

Could use the AI approach (which might be expensive) or alternatively just direct OCR -

Python - https://pyimagesearch.com/2021/10/27/automatically-ocring-receipts-and-scans/

vladartym · 2024-01-19T21:38:03Z

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

scastiel · 2024-01-19T21:46:35Z

That would be an interesting feature, but I really lack skills in all the AI stuff. Marking the issue with help wanted label 😉

rgov · 2024-01-19T21:51:04Z

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

There must be a zillion Node.js OpenAI API libraries to make it easy.

vladartym · 2024-01-19T21:58:30Z

Yeah all you would need is an image(s) input in the each expense form. Or a bulk receipt upload and get structured data back from openai of just the "company" and " "total amount" "currency" and the user can later go through all the transactions once they're processed. Would be a great on-the-go feature. I typically have to dedicated sometime after my trips to sit and go through all my transactions.

I can help with designs, but I'm not a strong developer.

p.s. Once again super thankful for this app @scastiel

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

adiso06 · 2024-01-20T00:53:18Z

I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost.

Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr


Explain
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";

// Init a new client
const mindeeClient = new mindee.Client({ apiKey: "my-api-key" });

// Load a file from disk
const inputSource = mindeeClient.docFromPath("/path/to/the/file.ext");

// Parse the file
const apiResponse = mindeeClient.parse(
  mindee.product.ReceiptV5,
  inputSource
);

// Handle the response Promise
apiResponse.then((resp) => {
  // print a string summary
  console.log(resp.document.toString());
});

manuerwin · 2024-01-20T04:35:47Z

There appears to be a JavaScript library that might help here?
https://github.com/naptha/tesseract.js/

scastiel · 2024-01-20T05:03:29Z

@adiso06: I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost. Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr

Interesting, it looks like what we’re looking for. Note that it costs $0.10/page after 250 pages/month). Not necessarily a problem, but it might become a paid feature in the future on Spliit.app (and an opt-in feature with bring your own API key if self-hosted).

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

@manuerwin: There appears to be a JavaScript library that might help here? https://github.com/naptha/tesseract.js/

Tesseract does the OCR, but extracting information from the read content remains, and might be the most complex part 😉

vladartym · 2024-01-20T05:27:15Z

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

I still think openai API key input is the easiest and cheapest way for us to make it available. Down the road we can monetize this if we chose to go this path for people who just want to get this working by paying and are not tech savvy.

Processing a 1000x1000 image with openAI vision will cost $0.00765. And the data can be structured based on how you want it returned to you. This also opens up new doors to extract other information in the future.
Some other examples we can ask AI todo is:

What is the category of the transaction?
Get Google Location ID and coordinates of the transaction? (if we ever wanna place it on a map)
What currency is this transaction in?

scastiel · 2024-01-29T22:46:07Z

In #69 I implemented a first version using OpenAI. It seems to work pretty well and costs ~$0.01-0.02/receipt.

If some of you have an OpenAI API access (with GPT-4 with Vision), I’d really appreciate some additional tests and feedback.

As I said in the PR, note that I’d really like to focus on making the feature work for now. Later we’ll think more about improving user experience 😉.

Also it’s my first time with OpenAI API and I’m really not an expert with AI, so open to feedback about the implementation here, like the prompt 😅

Screen.Recording.2024-01-29.at.17.45.11.mov

vladartym · 2024-01-29T23:31:01Z

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

scastiel · 2024-01-29T23:37:55Z

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

For now the only way is to run the application locally I’m afraid.

vladartym · 2024-01-30T02:54:33Z

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

scastiel · 2024-01-30T03:02:02Z

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

You need to define two environment variables (in container.env if running with Docker):

NEXT_PUBLIC_ENABLE_RECEIPT_EXTRACT=true
OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX

vladartym · 2024-01-30T03:17:27Z

Ahh sweet got it! Updated that, I see the button now.

However getting some errors when uploading file:

"Something wrong happened when uploading the document. Please retry later or select a different file."

scastiel · 2024-01-30T03:30:50Z

Forgot to mention you need to enable expense receipts as well: https://github.com/spliit-app/spliit?tab=readme-ov-file#expense-documents (which reminds me that receipt scanning depends on this feature, and the README should mention it).

Edit: actually receipt scanning doesn’t have to depend on expense documents. Although it would make more sense in a production application, it is possible at least for dev to scan receipts without storing them on S3. I’ll work on it.

vladartym · 2024-01-30T04:09:40Z

Haha thats my bad, I should've read the readme better.

I think I'm getting some permission issues now with AWS. I don't want to be a burden with this either. I can wait until this is on prod/staging to test out the feature.

Just a note so far from what I see - is we'd probably need an input box for storing openai api key somewhere, I'd also assume it would have to be stored locally (in a cookie?) since there are no user accounts.

scastiel · 2024-01-30T22:11:23Z

Alright, the feature is merged 🎉

I added a dialog to make it more clear how it works. Feel free to test at https://spliit.app and give your feedback 😉

Screen.Recording.2024-01-30.at.17.10.01.mov

A few remarks:

For now, I pay for OpenAI calls on Spliit.app. There is a hard limit in monthly costs; I don’t expect it to be reached unless thousands of people use the feature. I may put in place per-group premium features in the future.
If you’re self-hosting, you need to enable S3 document upload if you enabled the receipt scanning. It should be possible to enable only receipt scanning (reading the image, generating a data-URL, etc.) but I didn’t think it was necessary for now.

vladartym · 2024-01-30T23:50:51Z

@scastiel Amazing as always!! Works like a charm, and images that dont have any information simply get attached which is pretty great!! The per-group premium features is def better than having the need for every single person to subscribe.

Thanks again for the speedy turn around on this 😊 🥳

scastiel · 2024-01-31T21:52:41Z

A huge thanks to everyone who participated here! This is because of this collaboration that I love building Spliit as an open source project ❤️

I wrote a short blog post about the feature: Announcing Receipt Scanning Using AI. And so I added a blog to Spliit.app too 😉. Feel free to share it with your community!

scastiel added the help wanted Extra attention is needed label Jan 19, 2024

scastiel mentioned this issue Jan 29, 2024

Create expense from receipt #69

Merged

scastiel added the enhancement New feature or request label Jan 29, 2024

scastiel self-assigned this Jan 29, 2024

scastiel closed this as completed in #69 Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic receipt scanning #23

Automatic receipt scanning #23

rgov commented Jan 6, 2024 •

edited

Loading

adiso06 commented Jan 18, 2024 •

edited

Loading

vladartym commented Jan 19, 2024

scastiel commented Jan 19, 2024

rgov commented Jan 19, 2024

vladartym commented Jan 19, 2024

adiso06 commented Jan 20, 2024 •

edited

Loading

manuerwin commented Jan 20, 2024

scastiel commented Jan 20, 2024

vladartym commented Jan 20, 2024

scastiel commented Jan 29, 2024 •

edited

Loading

vladartym commented Jan 29, 2024

scastiel commented Jan 29, 2024

vladartym commented Jan 30, 2024 •

edited

Loading

scastiel commented Jan 30, 2024

vladartym commented Jan 30, 2024

scastiel commented Jan 30, 2024 •

edited

Loading

vladartym commented Jan 30, 2024

scastiel commented Jan 30, 2024 •

edited

Loading

vladartym commented Jan 30, 2024

scastiel commented Jan 31, 2024

Automatic receipt scanning #23

Automatic receipt scanning #23

Comments

rgov commented Jan 6, 2024 • edited Loading

adiso06 commented Jan 18, 2024 • edited Loading

vladartym commented Jan 19, 2024

scastiel commented Jan 19, 2024

rgov commented Jan 19, 2024

vladartym commented Jan 19, 2024

adiso06 commented Jan 20, 2024 • edited Loading

manuerwin commented Jan 20, 2024

scastiel commented Jan 20, 2024

vladartym commented Jan 20, 2024

scastiel commented Jan 29, 2024 • edited Loading

vladartym commented Jan 29, 2024

scastiel commented Jan 29, 2024

vladartym commented Jan 30, 2024 • edited Loading

scastiel commented Jan 30, 2024

vladartym commented Jan 30, 2024

scastiel commented Jan 30, 2024 • edited Loading

vladartym commented Jan 30, 2024

scastiel commented Jan 30, 2024 • edited Loading

vladartym commented Jan 30, 2024

scastiel commented Jan 31, 2024

rgov commented Jan 6, 2024 •

edited

Loading

adiso06 commented Jan 18, 2024 •

edited

Loading

adiso06 commented Jan 20, 2024 •

edited

Loading

scastiel commented Jan 29, 2024 •

edited

Loading

vladartym commented Jan 30, 2024 •

edited

Loading

scastiel commented Jan 30, 2024 •

edited

Loading

scastiel commented Jan 30, 2024 •

edited

Loading