Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard #304

Domsdorm · 2021-09-14T06:08:34Z

Hi! I am currently trying to develop a RPA system where invoices that were scanned manually(into PDF) are OCR-ed and specific text are then extracted for filling up fields in a web-based form. I was wondering if TagUI is able to do this?

A little background, I'm currently a uni student interning and was given this task to perform. I came across your tool while studying and was really impressed and am trying to use it for the task listed above.

Thank you for your help!!

kensoh · 2021-09-15T07:44:07Z

Hi @Domsdorm see below link for a full solution and demo of what you mentioned. It is an automation script to solve Automation Anywhere Week 4 RPA challenge. The tough part is from converting from the unstructured image data into structured data. This is very tough to get right, and involve a lot of trial and error and work. You can see below link to know more about the considerations and options to do this.

aisingapore/TagUI#1093 (comment)

Domsdorm · 2021-09-15T14:34:22Z

Thanks @kensoh for the reply! Currently am trying to bypass the firewall that my company has by using the steps you have told me about in the telegram group. I Will update if I run into any troubles when doing the code.

Really appreciate you taking your own free time to help. Cheers!

Domsdorm · 2021-09-16T03:16:36Z

Currently am trying to run the code for the Week 4 RPA challenge but OpenJDK is needed, is there anyway to bypass this?

Also is there a way to check in the script if the invoice data(for example Invoice number) is correctly being pulled?

kensoh · 2021-09-16T06:35:55Z

For 1st question, need OpenJDK / Java 64-bit to do the part on opening file explorer to choose file. But there is workaround, you can use r.upload() to choose the file without opening the file browser (criteria needed by organiser for the challenge). See this solution from another user - https://github.com/DanielCCF/BotGamesAA/blob/master/Week4/Solution-Python.py#L125

For 2nd question, this RPA package requires user to know Python. I'm assuming that you are new to Python that's why you ask this. Because the answer is already written in the Python script itself, the OCR of the image files and extracting the individual data like invoice number. This automation is hard to understand and do without Python knowledge. Most of it is Python programming knowledge, only some are RPA concepts related to this tool.

Domsdorm · 2021-09-16T10:59:24Z

Yupp I'm not from a CS background but am interested in learning more. Thank you for your patience tho 😅

Domsdorm · 2021-09-22T08:02:12Z

Currently I am trying to select a drop down option (incoming WO). I am able to select the main header but am unable to select the Incoming WO option.

')

r.click('//[@aria-haspopup="hauptMenu:submenu:12"]')
r.wait(1)
r.click('//[@tabindex=""-1">Incoming WO<"]

Is there a way to select that option? Or by using r.click(x, y). If using r.click(x, y) is possible, how do I know which x,y values to input.

kensoh · 2021-09-22T08:32:13Z

You can try if using the r.select() works - see more on usage and examples in API section.

kensoh added the query label Sep 15, 2021

kensoh changed the title ~~Is it possible to batch OCR Invoice PDFs and extract specific texts from them to input into web based fields?~~ Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard Sep 15, 2021

kensoh closed this as completed Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard #304

Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard #304

Domsdorm commented Sep 14, 2021

kensoh commented Sep 15, 2021

Domsdorm commented Sep 15, 2021

Domsdorm commented Sep 16, 2021

kensoh commented Sep 16, 2021

Domsdorm commented Sep 16, 2021

Domsdorm commented Sep 22, 2021 •

edited

Loading

kensoh commented Sep 22, 2021

Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard #304

Batch OCR Invoice PDFs and extract text to input into web based fields - can but hard #304

Comments

Domsdorm commented Sep 14, 2021

kensoh commented Sep 15, 2021

Domsdorm commented Sep 15, 2021

Domsdorm commented Sep 16, 2021

kensoh commented Sep 16, 2021

Domsdorm commented Sep 16, 2021

Domsdorm commented Sep 22, 2021 • edited Loading

kensoh commented Sep 22, 2021

Domsdorm commented Sep 22, 2021 •

edited

Loading