Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用pix2text 1.0版本出错 #625

Closed
1 task done
yjbsl opened this issue Aug 14, 2024 · 2 comments
Closed
1 task done

使用pix2text 1.0版本出错 #625

yjbsl opened this issue Aug 14, 2024 · 2 comments

Comments

@yjbsl
Copy link

yjbsl commented Aug 14, 2024

Issues

  • I have browsed through the Issues. 我已浏览过Issues,确定没有重复提问。

Umi-OCR version 程序版本

2.1.3和2.1.2都测试过

Windows version 系统版本

win11

OCR plugins Used 使用的OCR插件

Pix2Text

Reproduction steps 复现步骤

设置
如图,ocr该书的过程中卡住,等了很久也没反应。cli中的报错是:

[Error] 异步运行发生错误: Traceback (most recent call last):
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\utils\thread_pool.py", line 22, in run
    self._taskFunc(*self._args, **self._kwargs)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\mission\mission.py", line 238, in _taskRun
    res = self.msnTask(msnInfo, msn)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\mission\mission_doc.py", line 262, in msnTask
    tbs = tbpu.run(tbs)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_multi_para.py", line 30, in run
    self.pp.run(tbs)  # 预测结尾分隔符
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_tools\paragraph_parse.py", line 61, in run
    units = self._get_units(text_blocks, self.get_info)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_tools\paragraph_parse.py", line 72, in _get_units
    units.append((bbox, (text[0], text[-1]), tb))
IndexError: string index out of range

Problem screenshots or related files (optional) 问题截图或相关文件(可选)

测度论与概率论基础 (程士宏编著) (Z-Library).pdf

@hiroi-sora
Copy link
Owner

感谢提出,这是P2T输出项不标准导致的异常。你可以手动更新代码修复该bug:

打开 UmiOCR-data\py_src\ocr\tbpu\parser_tools\line_preprocessing.py

第85行 linePreprocessing 函数的后面,添加一行代码:

def linePreprocessing(textBlocks):
    textBlocks = [i for i in textBlocks if i.get("text", False)]

如图:

image

下个版本将更新此bug修复。

@yjbsl
Copy link
Author

yjbsl commented Aug 17, 2024

谢谢大佬修复

@yjbsl yjbsl closed this as completed Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants