Python tool for converting files and office documents to Markdown.
-
Updated
Jan 16, 2025 - Python
Python tool for converting files and office documents to Markdown.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Get your documents ready for gen AI
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Parse files for optimal RAG
borb is a library for reading, creating and manipulating PDF files in python.
💀 Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh
Add a description, image, and links to the pdf topic page so that developers can more easily learn about it.
To associate your repository with the pdf topic, visit your repo's landing page and select "manage topics."