Update PDF extraction and OCR options for hybrid chunking · instructlab/sdg@6790918

Triggered via pull request February 12, 2025 20:29

synchronize #557

Status Failure

Total duration 2m 58s

Artifacts –

lint.yml

on: pull_request

Matrix: lint

7 errors and 2 warnings

pylint: src/instructlab/sdg/utils/chunkers.py#L11

E0401: Unable to import 'docling.chunking' (import-error)

pylint: src/instructlab/sdg/utils/chunkers.py#L11

E0611: No name 'chunking' in module 'docling' (no-name-in-module)

pylint: src/instructlab/sdg/utils/chunkers.py#L14

E0611: No name 'AcceleratorOptions' in module 'docling.datamodel.pipeline_options' (no-name-in-module)

pylint: src/instructlab/sdg/utils/chunkers.py#L63

E1121: Too many positional arguments for constructor call (too-many-function-args)

pylint: src/instructlab/sdg/utils/chunkers.py#L63

E1123: Unexpected keyword argument 'accelerator_options' in constructor call (unexpected-keyword-arg)

pylint: src/instructlab/sdg/utils/taxonomy.py#L16

E0401: Unable to import 'docling_parse.pdf_parsers' (import-error)

Process completed with exit code 6.

pylint: src/instructlab/sdg/utils/chunkers.py#L22

W0611: Unused tabulate imported from tabulate (unused-import)

pylint: src/instructlab/sdg/utils/taxonomy.py#L154

W0718: Catching too general exception Exception (broad-exception-caught)