Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDG fails on markdown table with preceding text #548

Open
cfchase opened this issue Feb 11, 2025 · 1 comment · May be fixed by #549
Open

SDG fails on markdown table with preceding text #548

cfchase opened this issue Feb 11, 2025 · 1 comment · May be fixed by #549
Labels
bug Something isn't working

Comments

@cfchase
Copy link

cfchase commented Feb 11, 2025

Describe the bug
When a source document has a table with some preceding text, SDG fails with failed to generate data with exception: list index out of range

To Reproduce
Steps to reproduce the behavior:

  1. Create a Markdown in a git repo such as
    https://github.com/cfchase/sample-md/blob/main/README.md
Hello World

| Hello | Hello |
|-------|-------|
| World | World |
  1. Create a qna.yaml in your taxonomy referring to the markdown file such as
    https://github.com/cfchase/sample-md/blob/main/qna.yaml
#~/.local/share/instructlab/taxonomy/knowledge/qna.yaml
...snip...
document:
  repo: 'https://github.com/cfchase/sample-md.git'
  commit: b5bbdd7516fd5f06956f2a1e3f207790a750c00e
  patterns:
    - 'README.md'
  1. Run ilab data generate
  2. See error failed to generate data with exception: list index out of range

Expected behavior
SDG continues past the document ingestion

Command Used
ilab data generate --pipeline=simple

Screenshots

Device Info (please complete the following information):

  • Hardware Specs: Apple M3 Pro Chip, 36 GB Memory
  • OS Version: [e.g. Mac OS 15.3
  • Python Version: Python 3.11.9
  • InstructLab Version:
  sys.version: 3.11.9 (main, Aug 26 2024, 10:26:18) [Clang 15.0.0 (clang-1500.3.9.4)]
  sys.platform: darwin
  os.name: posix
  platform.release: 24.3.0
  platform.machine: arm64
  platform.node: cchase-mac
  platform.python_version: 3.11.9
  platform.cpu_brand: Apple M3 Pro
  memory.total: 36.00 GB
  memory.available: 12.11 GB
  memory.used: 18.85 GB

InstructLab:
  instructlab.version: 0.23.0rc1.dev124
  instructlab-dolomite.version: 0.2.0
  instructlab-eval.version: 0.5.1
  instructlab-quantize.version: 0.1.0
  instructlab-schema.version: 0.4.2
  instructlab-sdg.version: 0.7.1.dev46
  instructlab-training.version: 0.7.0

Torch:
  torch.version: 2.4.1
  torch.backends.cpu.capability: NO AVX
  torch.version.cuda: None
  torch.version.hip: None
  torch.cuda.available: False
  torch.backends.cuda.is_built: False
  torch.backends.mps.is_built: True
  torch.backends.mps.is_available: True

llama_cpp_python:
  llama_cpp_python.version: 0.3.6
  llama_cpp_python.supports_gpu_offload: True

Additional context

@cfchase cfchase added the bug Something isn't working label Feb 11, 2025
@khaledsulayman
Copy link
Member

thanks for opening this, Chris! Will see if I can reproduce this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants