We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
目前版本[0.8.1]识别出pdf文档中的页眉页脚并做了自动剃除,在有些场景下由于页眉页脚里包含了些比较重要的内容需要在最终解析结果里保留页眉页脚的内容,能否增加页眉页脚剔除开关全局参数,并通过此参数控制是否剃除页眉页脚内容?
The text was updated successfully, but these errors were encountered:
middle.json中的discarded_blocks存储了每页被剔除的文本信息,可以自己写个逻辑转存出来。
Sorry, something went wrong.
书本的页码没有被识别出来,我需要页码用于业务定位,怎么能输出呢?
discarded_blocks 里面也没有
书本的页码没有被识别出来,我需要页码用于业务定位,怎么能输出呢? discarded_blocks 里面也没有
只要页码的话,contentlist中有个pageidx字段代表页码
我需要从目录 的页码指向 对应的位置,page_index 从0 开始的,前面有封面、版号、目录、序言等,导致无法获取正确的页码
No branches or pull requests
目前版本[0.8.1]识别出pdf文档中的页眉页脚并做了自动剃除,在有些场景下由于页眉页脚里包含了些比较重要的内容需要在最终解析结果里保留页眉页脚的内容,能否增加页眉页脚剔除开关全局参数,并通过此参数控制是否剃除页眉页脚内容?
The text was updated successfully, but these errors were encountered: