Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic receipt scanning #23

Closed
rgov opened this issue Jan 6, 2024 · 20 comments · Fixed by #69
Closed

Automatic receipt scanning #23

rgov opened this issue Jan 6, 2024 · 20 comments · Fixed by #69
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rgov
Copy link

rgov commented Jan 6, 2024

To make entering purchases easier, a model (like donut-base-finetuned-cord-v2, demo) could be wired up to provide automatic receipt scanning.

Depending on how big of a model it is, you might be able to run inference entirely client side with ONNX Runtime Web or similar.

@adiso06
Copy link

adiso06 commented Jan 18, 2024

+1, would like some sort of OCR based receipt scanning/splitting feature similiar to splitwise. Great app!

Could use the AI approach (which might be expensive) or alternatively just direct OCR -

Python - https://pyimagesearch.com/2021/10/27/automatically-ocring-receipts-and-scans/

@vladartym
Copy link

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

@scastiel scastiel added the help wanted Extra attention is needed label Jan 19, 2024
@scastiel
Copy link
Member

That would be an interesting feature, but I really lack skills in all the AI stuff. Marking the issue with help wanted label 😉

@rgov
Copy link
Author

rgov commented Jan 19, 2024

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

There must be a zillion Node.js OpenAI API libraries to make it easy.

@vladartym
Copy link

Yeah all you would need is an image(s) input in the each expense form. Or a bulk receipt upload and get structured data back from openai of just the "company" and " "total amount" "currency" and the user can later go through all the transactions once they're processed. Would be a great on-the-go feature. I typically have to dedicated sometime after my trips to sit and go through all my transactions.

I can help with designs, but I'm not a strong developer.

p.s. Once again super thankful for this app @scastiel

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

@adiso06
Copy link

adiso06 commented Jan 20, 2024

I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost.

Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr


Explain
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";

// Init a new client
const mindeeClient = new mindee.Client({ apiKey: "my-api-key" });

// Load a file from disk
const inputSource = mindeeClient.docFromPath("/path/to/the/file.ext");

// Parse the file
const apiResponse = mindeeClient.parse(
  mindee.product.ReceiptV5,
  inputSource
);

// Handle the response Promise
apiResponse.then((resp) => {
  // print a string summary
  console.log(resp.document.toString());
});


@manuerwin
Copy link

There appears to be a JavaScript library that might help here?
https://github.com/naptha/tesseract.js/

@scastiel
Copy link
Member

@adiso06: I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost. Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr

Interesting, it looks like what we’re looking for. Note that it costs $0.10/page after 250 pages/month). Not necessarily a problem, but it might become a paid feature in the future on Spliit.app (and an opt-in feature with bring your own API key if self-hosted).

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

@manuerwin: There appears to be a JavaScript library that might help here? https://github.com/naptha/tesseract.js/

Tesseract does the OCR, but extracting information from the read content remains, and might be the most complex part 😉

@vladartym
Copy link

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

I still think openai API key input is the easiest and cheapest way for us to make it available. Down the road we can monetize this if we chose to go this path for people who just want to get this working by paying and are not tech savvy.

Processing a 1000x1000 image with openAI vision will cost $0.00765. And the data can be structured based on how you want it returned to you. This also opens up new doors to extract other information in the future.
Some other examples we can ask AI todo is:

  • What is the category of the transaction?
  • Get Google Location ID and coordinates of the transaction? (if we ever wanna place it on a map)
  • What currency is this transaction in?

@scastiel scastiel added the enhancement New feature or request label Jan 29, 2024
@scastiel
Copy link
Member

scastiel commented Jan 29, 2024

In #69 I implemented a first version using OpenAI. It seems to work pretty well and costs ~$0.01-0.02/receipt.

If some of you have an OpenAI API access (with GPT-4 with Vision), I’d really appreciate some additional tests and feedback.

As I said in the PR, note that I’d really like to focus on making the feature work for now. Later we’ll think more about improving user experience 😉.

Also it’s my first time with OpenAI API and I’m really not an expert with AI, so open to feedback about the implementation here, like the prompt 😅

Screen.Recording.2024-01-29.at.17.45.11.mov

@scastiel scastiel self-assigned this Jan 29, 2024
@vladartym
Copy link

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

@scastiel
Copy link
Member

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

For now the only way is to run the application locally I’m afraid.

@vladartym
Copy link

vladartym commented Jan 30, 2024

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

@scastiel
Copy link
Member

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

You need to define two environment variables (in container.env if running with Docker):

NEXT_PUBLIC_ENABLE_RECEIPT_EXTRACT=true
OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX

@vladartym
Copy link

Ahh sweet got it! Updated that, I see the button now.

However getting some errors when uploading file:

"Something wrong happened when uploading the document. Please retry later or select a different file."

Screenshot 2024-01-29 at 10 16 48 PM

@scastiel
Copy link
Member

scastiel commented Jan 30, 2024

Forgot to mention you need to enable expense receipts as well: https://github.com/spliit-app/spliit?tab=readme-ov-file#expense-documents (which reminds me that receipt scanning depends on this feature, and the README should mention it).

Edit: actually receipt scanning doesn’t have to depend on expense documents. Although it would make more sense in a production application, it is possible at least for dev to scan receipts without storing them on S3. I’ll work on it.

@vladartym
Copy link

Haha thats my bad, I should've read the readme better.

I think I'm getting some permission issues now with AWS. I don't want to be a burden with this either. I can wait until this is on prod/staging to test out the feature.

Just a note so far from what I see - is we'd probably need an input box for storing openai api key somewhere, I'd also assume it would have to be stored locally (in a cookie?) since there are no user accounts.

Screenshot 2024-01-29 at 11 05 45 PM

@scastiel
Copy link
Member

scastiel commented Jan 30, 2024

Alright, the feature is merged 🎉

I added a dialog to make it more clear how it works. Feel free to test at https://spliit.app and give your feedback 😉

Screen.Recording.2024-01-30.at.17.10.01.mov

A few remarks:

  • For now, I pay for OpenAI calls on Spliit.app. There is a hard limit in monthly costs; I don’t expect it to be reached unless thousands of people use the feature. I may put in place per-group premium features in the future.
  • If you’re self-hosting, you need to enable S3 document upload if you enabled the receipt scanning. It should be possible to enable only receipt scanning (reading the image, generating a data-URL, etc.) but I didn’t think it was necessary for now.

@vladartym
Copy link

@scastiel Amazing as always!! Works like a charm, and images that dont have any information simply get attached which is pretty great!! The per-group premium features is def better than having the need for every single person to subscribe.

Thanks again for the speedy turn around on this 😊 🥳

@scastiel
Copy link
Member

A huge thanks to everyone who participated here! This is because of this collaboration that I love building Spliit as an open source project ❤️

I wrote a short blog post about the feature: Announcing Receipt Scanning Using AI. And so I added a blog to Spliit.app too 😉. Feel free to share it with your community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants