Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行demo报错 #155

Closed
fanzz1 opened this issue Jul 16, 2024 · 16 comments
Closed

运行demo报错 #155

fanzz1 opened this issue Jul 16, 2024 · 16 comments
Labels
bug Something isn't working P1 P1 BUG

Comments

@fanzz1
Copy link

fanzz1 commented Jul 16, 2024

Description of the bug | 错误描述

2024-07-16 11:22:00.324 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 14, text_len: 26394, cid_chars_radio: 0.0005324003650745361
2024-07-16 11:22:00.329 | ERROR | magic_pdf.user_api:parse_pdf:85 - list index out of range
Traceback (most recent call last):

File "E:\MinerU-master\demo\demo.py", line 23, in
pipe.pipe_parse()
│ └ <function UNIPipe.pipe_parse at 0x000001FE33EDD990>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>

File "E:\MinerU-master\magic_pdf\pipe\UNIPipe.py", line 35, in pipe_parse
self.pdf_mid_data = parse_union_pdf(self.pdf_bytes, self.model_list, self.image_writer,
│ │ │ │ │ │ │ │ └ <magic_pdf.rw.DiskReaderWriter.DiskReaderWriter object at 0x000001FE03BA2E60>
│ │ │ │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ │ │ │ │ └ []
│ │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ │ │ └ b'%PDF-1.5\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ └ <function parse_union_pdf at 0x000001FE33EDD750>
│ └ None
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>

File "E:\MinerU-master\magic_pdf\user_api.py", line 88, in parse_union_pdf
pdf_info_dict = parse_pdf(parse_pdf_by_txt)
│ └ <function parse_pdf_by_txt at 0x000001FE33EDD630>
└ <function parse_union_pdf..parse_pdf at 0x000001FE037F3E20>

File "E:\MinerU-master\magic_pdf\user_api.py", line 77, in parse_pdf
return method(
└ <function parse_pdf_by_txt at 0x000001FE33EDD630>

File "E:\MinerU-master\magic_pdf\pdf_parse_by_txt.py", line 12, in parse_pdf_by_txt
return pdf_parse_union(pdf_bytes,
│ └ b'%PDF-1.5\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
└ <function pdf_parse_union at 0x000001FE33EDD5A0>

File "E:\MinerU-master\magic_pdf\pdf_parse_union_core.py", line 225, in pdf_parse_union
page_info = parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter, parse_mode)
│ │ │ │ │ │ └ 'txt'
│ │ │ │ │ └ <magic_pdf.rw.DiskReaderWriter.DiskReaderWriter object at 0x000001FE03BA2E60>
│ │ │ │ └ '036C1D1F6867C983E74EEA67B33E09D6'
│ │ │ └ 0
│ │ └ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>
│ └ Document('', <memory, doc# 4>)
└ <function parse_page_core at 0x000001FE33EDD510>

File "E:\MinerU-master\magic_pdf\pdf_parse_union_core.py", line 83, in parse_page_core
img_blocks = magic_model.get_imgs(page_id)
│ │ └ 0
│ └ <function MagicModel.get_imgs at 0x000001FE316320E0>
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>

File "E:\MinerU-master\magic_pdf\model\magic_model.py", line 459, in get_imgs
records, _ = self.__tie_up_category_by_distance(page_no, 3, 4)
│ └ 0
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>

File "E:\MinerU-master\magic_pdf\model\magic_model.py", line 186, in __tie_up_category_by_distance
self.__model_list[page_no]["layout_dets"],
│ └ 0
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>

IndexError: list index out of range
2024-07-16 11:22:00.390 | WARNING | magic_pdf.user_api:parse_union_pdf:90 - parse_pdf_by_txt drop or error, switch to parse_pdf_by_ocr
2024-07-16 11:22:00.390 | ERROR | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:92 - use_inside_model is False, not allow to use inside model

进程已结束,退出代码1

How to reproduce the bug | 如何复现

1

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Device mode | 设备模式

cuda

@fanzz1 fanzz1 added the bug Something isn't working label Jul 16, 2024
@myhloli
Copy link
Collaborator

myhloli commented Jul 16, 2024

不好意思,文档忘了改了,代码中先加入写两行可以解决

import magic_pdf.model as model_config model_config.use_inside_model = True

@myhloli
Copy link
Collaborator

myhloli commented Jul 16, 2024

demo已修复

@myhloli myhloli closed this as completed Jul 16, 2024
@fanzz1
Copy link
Author

fanzz1 commented Jul 16, 2024

仍然报同样的错误

@myhloli myhloli reopened this Jul 16, 2024
@myhloli
Copy link
Collaborator

myhloli commented Jul 16, 2024

仍然报同样的错误

有试过命令行使用正常吗?

@fanzz1
Copy link
Author

fanzz1 commented Jul 16, 2024

是否是因为我的权重文件放置位置有错误
当我在命令行中运行:magic-pdf pdf-command --pdf "1.pdf" --inside_model true
报错:FileNotFoundError: [Errno 2] No such file or directory: '\tmp\models\MFD\weights.pt'
可是我已经下载了权重文件,按照说明我是不是应该将模型文件放在图示位置呢?
微信截图_20240716152903
谢谢!

@myhloli
Copy link
Collaborator

myhloli commented Jul 16, 2024

模型路径应为绝对路径,你可以cd到这个路径下pwd一下,看看绝对路径,再配置magic-pdf.json

@Jamly7
Copy link

Jamly7 commented Jul 16, 2024

增加了那两行代码后,仍然报错,并且use_inside_model依然是False,但我两行代码后,打印了use_inside_mode为True,(MinerU) root@quantum-slaver2:~/lh/MinerU# python3 MinerU.py
True
2024-07-16 08:03:20.043 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 27, text_len: 8790, cid_chars_radio: 0.0031690140845070424
2024-07-16 08:03:20.056 | ERROR | magic_pdf.user_api:parse_pdf:85 - list index out of range。代码如下:import os
import json

from loguru import logger

from magic_pdf.pipe.UNIPipe import UNIPipe
from magic_pdf.rw.DiskReaderWriter import DiskReaderWriter

import magic_pdf.model as model_config
model_config.use_inside_model = True

print(model_config.use_inside_model)

@Jamly7
Copy link

Jamly7 commented Jul 16, 2024

应该是model_config.use_inside_model = True;少两个下划线

@myhloli
Copy link
Collaborator

myhloli commented Jul 16, 2024

应该是model_config.use_inside_model = True;少两个下划线

你是对的,被markdown转义了

model_config.__use_inside_model__ = True;

@myhloli myhloli closed this as completed Jul 16, 2024
@ShadowCabnient
Copy link

是否是因为我的权重文件放置位置有错误 当我在命令行中运行:magic-pdf pdf-command --pdf "1.pdf" --inside_model true 报错:FileNotFoundError: [Errno 2] No such file or directory: '\tmp\models\MFD\weights.pt' 可是我已经下载了权重文件,按照说明我是不是应该将模型文件放在图示位置呢? 微信截图_20240716152903 谢谢!

我的是windows系统运行的,也是报这个错误,cp 那一步看不明白意义是什么

@myhloli
Copy link
Collaborator

myhloli commented Jul 23, 2024

是否是因为我的权重文件放置位置有错误 当我在命令行中运行:magic-pdf pdf-command --pdf "1.pdf" --inside_model true 报错:FileNotFoundError: [Errno 2] No such file or directory: '\tmp\models\MFD\weights.pt' 可是我已经下载了权重文件,按照说明我是不是应该将模型文件放在图示位置呢? 微信截图_20240716152903 谢谢!

我的是windows系统运行的,也是报这个错误,cp 那一步看不明白意义是什么

图没上传成功。。。

ps :cp那一步是拷贝默认配置文件到你的user目录,程序后面很多运行参数需要读取这个配置文件。
windows可以在powershell中执行cp命令的。

@Sep24thLegend
Copy link

image
下载好模型权重文件后,配置文件magic-pdf.json中的路径地址是要自己手动改一下嘛,还有模型权重文件的默认下载路径如上,需要跟自己待检测的pdf文件和输出的模型文件放在一个文件夹下嘛?

@myhloli
Copy link
Collaborator

myhloli commented Nov 2, 2024

image 下载好模型权重文件后,配置文件magic-pdf.json中的路径地址是要自己手动改一下嘛,还有模型权重文件的默认下载路径如上,需要跟自己待检测的pdf文件和输出的模型文件放在一个文件夹下嘛?

不需要改json里的路径,下载完成后就已经自动配置好了,模型文件也不需要移动位置,直接按教程下一步就行

@Sep24thLegend
Copy link

image
求教这种情况是什么原因呢

@myhloli
Copy link
Collaborator

myhloli commented Nov 2, 2024

路径不要用{}包裹,所有的\换成/

@Sep24thLegend
Copy link

已解决(抱拳,多谢~)

@dt-yy dt-yy added the P1 P1 BUG label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 P1 BUG
Projects
None yet
Development

No branches or pull requests

6 participants