-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
运行demo报错 #155
Comments
不好意思,文档忘了改了,代码中先加入写两行可以解决 import magic_pdf.model as model_config model_config.use_inside_model = True |
demo已修复 |
仍然报同样的错误 |
有试过命令行使用正常吗? |
模型路径应为绝对路径,你可以cd到这个路径下pwd一下,看看绝对路径,再配置magic-pdf.json |
增加了那两行代码后,仍然报错,并且use_inside_model依然是False,但我两行代码后,打印了use_inside_mode为True,(MinerU) root@quantum-slaver2:~/lh/MinerU# python3 MinerU.py from loguru import logger from magic_pdf.pipe.UNIPipe import UNIPipe import magic_pdf.model as model_config print(model_config.use_inside_model) |
应该是model_config.use_inside_model = True;少两个下划线 |
你是对的,被markdown转义了 model_config.__use_inside_model__ = True; |
路径不要用{}包裹,所有的 |
已解决(抱拳,多谢~) |
Description of the bug | 错误描述
2024-07-16 11:22:00.324 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 14, text_len: 26394, cid_chars_radio: 0.0005324003650745361
2024-07-16 11:22:00.329 | ERROR | magic_pdf.user_api:parse_pdf:85 - list index out of range
Traceback (most recent call last):
File "E:\MinerU-master\demo\demo.py", line 23, in
pipe.pipe_parse()
│ └ <function UNIPipe.pipe_parse at 0x000001FE33EDD990>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
File "E:\MinerU-master\magic_pdf\pipe\UNIPipe.py", line 35, in pipe_parse
self.pdf_mid_data = parse_union_pdf(self.pdf_bytes, self.model_list, self.image_writer,
│ │ │ │ │ │ │ │ └ <magic_pdf.rw.DiskReaderWriter.DiskReaderWriter object at 0x000001FE03BA2E60>
│ │ │ │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ │ │ │ │ └ []
│ │ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ │ │ └ b'%PDF-1.5\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
│ │ └ <function parse_union_pdf at 0x000001FE33EDD750>
│ └ None
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x000001FE03BA2DD0>
File "E:\MinerU-master\magic_pdf\user_api.py", line 88, in parse_union_pdf
pdf_info_dict = parse_pdf(parse_pdf_by_txt)
│ └ <function parse_pdf_by_txt at 0x000001FE33EDD630>
└ <function parse_union_pdf..parse_pdf at 0x000001FE037F3E20>
File "E:\MinerU-master\magic_pdf\pdf_parse_by_txt.py", line 12, in parse_pdf_by_txt
return pdf_parse_union(pdf_bytes,
│ └ b'%PDF-1.5\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
└ <function pdf_parse_union at 0x000001FE33EDD5A0>
File "E:\MinerU-master\magic_pdf\pdf_parse_union_core.py", line 225, in pdf_parse_union
page_info = parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter, parse_mode)
│ │ │ │ │ │ └ 'txt'
│ │ │ │ │ └ <magic_pdf.rw.DiskReaderWriter.DiskReaderWriter object at 0x000001FE03BA2E60>
│ │ │ │ └ '036C1D1F6867C983E74EEA67B33E09D6'
│ │ │ └ 0
│ │ └ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>
│ └ Document('', <memory, doc# 4>)
└ <function parse_page_core at 0x000001FE33EDD510>
File "E:\MinerU-master\magic_pdf\pdf_parse_union_core.py", line 83, in parse_page_core
img_blocks = magic_model.get_imgs(page_id)
│ │ └ 0
│ └ <function MagicModel.get_imgs at 0x000001FE316320E0>
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>
File "E:\MinerU-master\magic_pdf\model\magic_model.py", line 459, in get_imgs
records, _ = self.__tie_up_category_by_distance(page_no, 3, 4)
│ └ 0
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>
File "E:\MinerU-master\magic_pdf\model\magic_model.py", line 186, in __tie_up_category_by_distance
self.__model_list[page_no]["layout_dets"],
│ └ 0
└ <magic_pdf.model.magic_model.MagicModel object at 0x000001FE354A3970>
IndexError: list index out of range
2024-07-16 11:22:00.390 | WARNING | magic_pdf.user_api:parse_union_pdf:90 - parse_pdf_by_txt drop or error, switch to parse_pdf_by_ocr
2024-07-16 11:22:00.390 | ERROR | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:92 - use_inside_model is False, not allow to use inside model
进程已结束,退出代码1
How to reproduce the bug | 如何复现
1
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: