Skip to content

mercoa-finance/llm-document-ocr

Repository files navigation

llm-document-ocr

npm version npm downloads license


Sponsored by Mercoa, the API for BillPay and Invoicing. Everything you need to launch accounts payable in your product with a single API!


LLM Based OCR and Document Parsing for Node.js. Uses GPT4 and Claude3 for OCR and data extraction.

  • Converts PDFs (including multi page PDFs) into PNGs for use with GPT4
  • Automatically crops white-space to create smaller inputs
  • Cleans up JSON string returned by the LLM and converts it to an JSON object
  • Custom prompt support for capturing any data you need

Supports:

  • ✅ PNG
  • ✅ WEBP
  • ✅ JPEG / JPG
  • ✅ GIF
  • ✅ PDF
  • ✅ Multi-page PDF
  • ❌ DOC
  • ❌ DOCX

Installation

npm i --save llm-document-ocr
yarn add llm-document-ocr

Note: If you are deploying via Docker, see the Dockerfile for an example Alpine base image.

Usage

import { DocumentOcr, prompts } from "llm-document-ocr";

const documentOcr = new DocumentOcr({
  apiKey: 'YOUR-OpenAi/Anthropic-API-KEY' // required, defaults to process.env.OPENAI_API_KEY. OpenAI models need an OpenAI API key, Antrhopic models need an Anthropic API key.
  model: "gpt-4o", // optional, defaults to "gpt-4-turbo". Options are: "gpt-4-turbo", "gpt-4o", "claude-3-opus-20240229", "claude-3-sonnet-20240229", "claude-3-haiku-20240307"
  standardFontDataUrl: "https://unpkg.com/[email protected]/standard_fonts/", // optional, defaults to "https://unpkg.com/[email protected]/standard_fonts/". You can use the systems fonts or the fonts under ./node_modules/pdfjs-dist/standard_fonts/ as well.
});

const documentData = await documentOcr.process({
  model: "gpt-4o", // optional, defaults to model defined in constructor
  document: 'JVBERi0xLjMNCiXi48/TDQoNCjEgMCBvYmoNCjw8DQ...', // Base64 String, Base64 URI, or Buffer
  mimeType: 'application/pdf', // mime-type of the document or image
  prompt: 'invoiceStartDate, invoiceEndDate, amount', // system prompt for data extraction. See examples below.
  pageOptions: 'FIRST_AND_LAST' // optional, defaults to 'ALL'. Determines which page of the PDF will be processed. Available options are 'ALL', 'FIRST_AND_LAST', 'FIRST', 'LAST'.
})

Prompts

Prompts will be automatically prefixed to tell the LLM to return JSON. You will need to specify the data you wish to extract, and the LLM will return a JSON object with those keys.

For example, the prompt we use at Mercoa for invoice processing is the following:

`invoice number, invoice amount, currency (as ISO 4217 code), dueDate, invoiceDate, serviceStartDate, serviceEndDate,
  vendor's [name, email with @, website],
  line items [amnt, price, qty, des, name, cur (as ISO 4217 code)]`;

And this returns a JSON object that looks like:

{
  invoiceNumber?: string | number
  invoiceAmount?: string | number
  currency?: string
  dueDate?: string
  invoiceDate?: string
  serviceStartDate?: string
  serviceEndDate?: string
  vendor: {
    name?: string
    email?: string
    website?: string
  }
  lineItems: Array<{
    des?: string
    qty?: string | number
    price?: string | number
    amnt?: string | number
    name?: string
    cur?: string
  }>
}

Issues and Contributing

If you encounter a bug or want to see something added/changed, please go ahead and open an issue

If you wish to contribute to the library, thanks! Please see the CONTRIBUTING guide for more details.

License

MIT © Mercoa, Inc

About

LLM Based OCR and Document Parsing for Node.js

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published