Skip to content

oslowy/ocr-repository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR Pipeline with swappable image pre-processing

The code in the /google folder runs with Google Cloud Functions. Please see requirements.txt for dependencies from the Google Cloud platform.

The main.py script contains three entry points, one for each of three Google Cloud Functions (defined separately in a .py file of the same name).

  • batch: triggered by HTTP request. Loads images from storage bucket and fans them out into threads.
  • process: triggered by Pub/Sub message. Runs custom image pre-processing (currently does nothing)
  • detect: triggered by Pub/Sub message. Sends images to Vision API for OCR and stores detected text in bucket.

Another file contains helper code, not particular to a function:

  • message.py

The code is configured to reference several platform-specific and implementation-specific details:

  • Argument and data structure formats for Google Cloud services such as Pub/Sub, Storage, and Vision.
  • Environment variables. Other implementations should set these up on the serverless compute unit's settings.
  • Names of message-passing topics and storage buckets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages