Skip to content

Update PDF extraction and OCR options for hybrid chunking #1182

Update PDF extraction and OCR options for hybrid chunking

Update PDF extraction and OCR options for hybrid chunking #1182

Triggered via pull request February 12, 2025 20:29
Status Failure
Total duration 2m 58s
Artifacts

lint.yml

on: pull_request
Matrix: lint
lint-workflow-complete
0s
lint-workflow-complete
Fit to window
Zoom out
Zoom in

Annotations

7 errors and 2 warnings
pylint: src/instructlab/sdg/utils/chunkers.py#L11
E0401: Unable to import 'docling.chunking' (import-error)
pylint: src/instructlab/sdg/utils/chunkers.py#L11
E0611: No name 'chunking' in module 'docling' (no-name-in-module)
pylint: src/instructlab/sdg/utils/chunkers.py#L14
E0611: No name 'AcceleratorOptions' in module 'docling.datamodel.pipeline_options' (no-name-in-module)
pylint: src/instructlab/sdg/utils/chunkers.py#L63
E1121: Too many positional arguments for constructor call (too-many-function-args)
pylint: src/instructlab/sdg/utils/chunkers.py#L63
E1123: Unexpected keyword argument 'accelerator_options' in constructor call (unexpected-keyword-arg)
pylint: src/instructlab/sdg/utils/taxonomy.py#L16
E0401: Unable to import 'docling_parse.pdf_parsers' (import-error)
pylint
Process completed with exit code 6.
pylint: src/instructlab/sdg/utils/chunkers.py#L22
W0611: Unused tabulate imported from tabulate (unused-import)
pylint: src/instructlab/sdg/utils/taxonomy.py#L154
W0718: Catching too general exception Exception (broad-exception-caught)