Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: No usable bins found for haploid depth estimation #25

Closed
fellen31 opened this issue Mar 14, 2024 · 7 comments
Closed

Error: No usable bins found for haploid depth estimation #25

fellen31 opened this issue Mar 14, 2024 · 7 comments

Comments

@fellen31
Copy link

fellen31 commented Mar 14, 2024

Hi,

I'm trying to run HifiCNV (0.1.7-70e9988) on a subset of data, which would be great for testing.
I've tried to use --exclude to exclude all regions that are empty in the BAM-file, but still get this error:

thread 'main' panicked at 'No usable bins found for haploid depth estimation.', src/copy_number_segmentation.rs:93:5

The BAM-file is generated like this:

samtools view -M -L test_data.bed HG003_PacBio_GRCh38.bam -h -O BAM > HG003.bam

test_data.bed:

chr20   2642593 2668393 NOP56
chrX    67534021        67740619        AR
chr5    70915030        70963942        SMN1
chr5    70039638        70088522        SMN2

Thanks a lot!

hificnv \
       \
      --bam HG003.bam \
      --expected-cn expected_cn.hg38.XX.bed \
      --exclude test_profile.hificnv.exclude_regions.bed \
      --maf HG003.sorted.vcf.gz \
      --ref GRCh38_no_alt_analysis_set.fasta \
      --threads 8 \
      --output-prefix HG003
[2024-03-14][09:18:48][hificnv][INFO] Starting hificnv
  [2024-03-14][09:18:48][hificnv][INFO] cmdline: hificnv --bam HG003.bam --expected-cn expected_cn.hg38.XX.bed --exclude test_profile.hificnv.exclude_regions.bed --maf HG003.sorted.vcf.gz --ref GRCh38_no_alt_analysis_set.fasta --threads 8 --output-prefix HG003
  [2024-03-14][09:18:48][hificnv][INFO] Running on 8 threads
  [2024-03-14][09:18:48][hificnv][INFO] Reading reference genome from file 'GRCh38_no_alt_analysis_set.fasta'
  [2024-03-14][09:18:59][hificnv][INFO] Reading excluded regions from file 'test_profile.hificnv.exclude_regions.bed'
  [2024-03-14][09:18:59][hificnv][INFO] Reading expected CN regions from file 'expected_cn.hg38.XX.bed'
  [2024-03-14][09:18:59][hificnv][INFO] Processing alignment file 'HG003.bam'
  [2024-03-14][09:22:09][hificnv][INFO] Writing depth track to bigwig file: 'HG003.Sample0.depth.bw'
  [2024-03-14][09:22:09][hificnv][INFO] Scanning minor allele frequency data from file 'HG003.sorted.vcf.gz'
  [2024-03-14][09:22:09][hificnv][INFO] Writing bigwig maf track to file: 'HG003.HG003.maf.bw'
  [2024-03-14][09:22:09][hificnv][INFO] Segmenting copy number
  thread 'main' panicked at 'No usable bins found for haploid depth estimation.', src/copy_number_segmentation.rs:93:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@ctsa
Copy link
Member

ctsa commented Mar 14, 2024

Hi @fellen31 - The short answer is that the CNV calling method isn't going to work well on narrowly targeted data because it needs enough sequence information to set an expected haploid coverage level.

We can certainly provide a better error message for this case as a first step to clarifying the issue. A quick workaround should be to expand the selected regions from the bam file, but I can't immediately clarify how much expansion is needed. We can see whether we can produce this from the bed file to make a more specific suggestion.

@fellen31
Copy link
Author

fellen31 commented Mar 14, 2024

Thanks for the quick reply. Is the expected haploid coverage set using the whole genome, and not just the regions that are not excluded with --exclude?

A warning instead of a panic would be great, but I also understand if you need a hard cutoff as to not provide inaccurate results. Another workaround would be to extract the regions from the reference, make each region into a pseudo chromosome, align the reads and then run HiFiCNV against that, which will happily give me some (inaccurate) calls.

If you have any ideas on how a minimal test dataset could look like, that would be great.

@ctsa
Copy link
Member

ctsa commented Mar 18, 2024

I tried to reproduce this issue: without any exclude file there are plenty of bins available for haploid depth estimation (whether it's accurate in this context is another matter), so for your case it seems likely that your exclude file could be filtering out the regions in test_data.bed. Do you think this could be happening?

Haploid coverage is estimated from the whole non-excluded genome, but with a zero-depth exclusion added to make the process more stable for cases like this. We include a bit more detail on it in methods:

https://github.com/PacificBiosciences/HiFiCNV/blob/main/docs/methods.md#segmentation

@fellen31
Copy link
Author

Thanks for your help. Good to hear that it seems to work for you. Even without the exclude file I can't seem to get it to work with my data though. Would it be possible for you to share the BAM-file that is working? Or the number of reads/coverage you have, if that could be the issue?

@ctsa
Copy link
Member

ctsa commented Mar 19, 2024

Can you clarify -- do you still see the error when you don't use any exclude file with the HG003 bam file you've described?

@fellen31
Copy link
Author

fellen31 commented Apr 4, 2024

Hi, sorry for the late reply. I realised it was actually HG002, but yes I still saw the error. Solved it by using a smaller reference genome. Feel free to close the issue.

@ctsa
Copy link
Member

ctsa commented Apr 4, 2024

Okay good to hear. From the above conversation it sounds like your regional selection from the HG003 bam should run on HiFiCNV in theory, even without a smaller genome. If you have the result without the exclude file, or are able to share the bam file we'd be able to take a look and help, just open a new issue in this case.

@ctsa ctsa closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants