使用pix2text 1.0版本出错 #625

yjbsl · 2024-08-14T05:55:05Z

Issues

I have browsed through the Issues. 我已浏览过Issues，确定没有重复提问。

Umi-OCR version 程序版本

2.1.3和2.1.2都测试过

Windows version 系统版本

win11

OCR plugins Used 使用的OCR插件

Pix2Text

Reproduction steps 复现步骤

如图，ocr该书的过程中卡住，等了很久也没反应。cli中的报错是：

[Error] 异步运行发生错误: Traceback (most recent call last):
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\utils\thread_pool.py", line 22, in run
    self._taskFunc(*self._args, **self._kwargs)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\mission\mission.py", line 238, in _taskRun
    res = self.msnTask(msnInfo, msn)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\mission\mission_doc.py", line 262, in msnTask
    tbs = tbpu.run(tbs)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_multi_para.py", line 30, in run
    self.pp.run(tbs)  # 预测结尾分隔符
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_tools\paragraph_parse.py", line 61, in run
    units = self._get_units(text_blocks, self.get_info)
  File "Umi-OCR_Paddle_v2.1.3\UmiOCR-data\py_src\ocr\tbpu\parser_tools\paragraph_parse.py", line 72, in _get_units
    units.append((bbox, (text[0], text[-1]), tb))
IndexError: string index out of range

Problem screenshots or related files (optional) 问题截图或相关文件（可选）

测度论与概率论基础 (程士宏编著) (Z-Library).pdf

hiroi-sora · 2024-08-16T10:02:28Z

感谢提出，这是P2T输出项不标准导致的异常。你可以手动更新代码修复该bug：

打开 UmiOCR-data\py_src\ocr\tbpu\parser_tools\line_preprocessing.py

第85行 linePreprocessing 函数的后面，添加一行代码：

def linePreprocessing(textBlocks):
    textBlocks = [i for i in textBlocks if i.get("text", False)]

如图：

下个版本将更新此bug修复。

yjbsl · 2024-08-17T14:17:59Z

谢谢大佬修复

hiroi-sora added a commit that referenced this issue Aug 16, 2024

修Bug：引擎原始输出项的"text"为空时，导致文本分析的越界错误 (#625)

cd7974f

yjbsl closed this as completed Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用pix2text 1.0版本出错 #625

使用pix2text 1.0版本出错 #625

yjbsl commented Aug 14, 2024

hiroi-sora commented Aug 16, 2024

yjbsl commented Aug 17, 2024

使用pix2text 1.0版本出错 #625

使用pix2text 1.0版本出错 #625

Comments

yjbsl commented Aug 14, 2024

Issues

Umi-OCR version 程序版本

Windows version 系统版本

OCR plugins Used 使用的OCR插件

Reproduction steps 复现步骤

Problem screenshots or related files (optional) 问题截图或相关文件（可选）

hiroi-sora commented Aug 16, 2024

yjbsl commented Aug 17, 2024