From 6e8e81c9de9d89d2d9e33597d18a3a4dd1f7427b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=B5=B5=E5=B0=8F=E8=92=99?= Date: Tue, 25 Jun 2024 17:46:34 +0800 Subject: [PATCH] update readme --- README.md | 45 +++++++++++++++++++++++++++++++++------- README_zh-CN.md | 55 ++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 81 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 2b06f4af..cdcaa6ed 100644 --- a/README.md +++ b/README.md @@ -41,21 +41,52 @@ Key features include: ### Usage Instructions -1. **Install Magic-PDF** - +#### 1. Install Magic-PDF ```bash -pip install magic-pdf[cpu] # Install the CPU version -or -pip install magic-pdf[gpu] # Install the GPU version +pip install magic-pdf ``` -2. **Usage via Command Line** +#### 2. Usage via Command Line +###### simple +```bash +cp magic-pdf.template.json to ~/magic-pdf.json +magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path" +``` +###### more ```bash magic-pdf --help ``` -### All Thanks To Our Contributors +#### 3. Usage via Api + +###### Local +```python +image_writer = DiskReaderWriter(local_image_dir) +image_dir = str(os.path.basename(local_image_dir)) +jso_useful_key = {"_pdf_type": "", "model_list": model_json} +pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer) +pipe.pipe_classify() +pipe.pipe_parse() +md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none") +``` + +###### Object Storage +```python +s3pdf_cli = S3ReaderWriter(pdf_ak, pdf_sk, pdf_endpoint) +image_dir = "s3://img_bucket/" +s3image_cli = S3ReaderWriter(img_ak, img_sk, img_endpoint, parent_path=image_dir) +pdf_bytes = s3pdf_cli.read(s3_pdf_path, mode=s3pdf_cli.MODE_BIN) +jso_useful_key = {"_pdf_type": "", "model_list": model_json} +pipe = UNIPipe(pdf_bytes, jso_useful_key, s3image_cli) +pipe.pipe_classify() +pipe.pipe_parse() +md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none") +``` + +Demo can be referred to [demo.py](https://github.com/magicpdf/Magic-PDF/blob/master/demo/demo.py) + +## All Thanks To Our Contributors diff --git a/README_zh-CN.md b/README_zh-CN.md index fbd0574a..1d5c9efc 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -17,7 +17,7 @@ # Magic-PDF -### 简介 +## 简介 Magic-PDF 是一款将 PDF 转化为 markdown 格式的工具。支持转换本地文档或者位于支持S3协议对象存储上的文件。 @@ -33,33 +33,64 @@ Magic-PDF 是一款将 PDF 转化为 markdown 格式的工具。支持转换本 - 支持cpu和gpu环境 - 支持windows/linux/mac平台 -### 上手指南 +## 上手指南 -###### 配置要求 +### 配置要求 python 3.9+ -###### 使用说明 - -1.安装Magic-PDF +### 使用说明 +#### 1. 安装Magic-PDF ```bash -pip install magic-pdf[cpu] # 安装 cpu 版本 -或 -pip install magic-pdf[gpu] # 安装 gpu 版本 +pip install magic-pdf ``` -2.通过命令行使用 +#### 2. 通过命令行使用 +###### 直接使用 +```bash +cp magic-pdf.template.json to ~/magic-pdf.json +magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path" +``` +###### 更多用法 ```bash magic-pdf --help ``` -### 版权说明 +#### 3. 通过接口调用 + +###### 本地使用 +```python +image_writer = DiskReaderWriter(local_image_dir) +image_dir = str(os.path.basename(local_image_dir)) +jso_useful_key = {"_pdf_type": "", "model_list": model_json} +pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer) +pipe.pipe_classify() +pipe.pipe_parse() +md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none") +``` + +###### 在对象存储上使用 +```python +s3pdf_cli = S3ReaderWriter(pdf_ak, pdf_sk, pdf_endpoint) +image_dir = "s3://img_bucket/" +s3image_cli = S3ReaderWriter(img_ak, img_sk, img_endpoint, parent_path=image_dir) +pdf_bytes = s3pdf_cli.read(s3_pdf_path, mode=s3pdf_cli.MODE_BIN) +jso_useful_key = {"_pdf_type": "", "model_list": model_json} +pipe = UNIPipe(pdf_bytes, jso_useful_key, s3image_cli) +pipe.pipe_classify() +pipe.pipe_parse() +md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none") +``` + +详细实现可参考 [demo.py](https://github.com/magicpdf/Magic-PDF/blob/master/demo/demo.py) + +## 版权说明 [LICENSE.md](https://github.com/magicpdf/Magic-PDF/blob/master/LICENSE.md) -### 鸣谢 +## 鸣谢 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - [PyMuPDF](https://github.com/pymupdf/PyMuPDF)