Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.6 hangs when only (GRCh38) alt-mapping reads present. #769

Closed
SHuang-Broad opened this issue Feb 4, 2024 · 5 comments
Closed

v1.6 hangs when only (GRCh38) alt-mapping reads present. #769

SHuang-Broad opened this issue Feb 4, 2024 · 5 comments
Assignees

Comments

@SHuang-Broad
Copy link

Describe the issue:
After upgrading to v1.6, we noticed this strange behavior, where the program hangs on a sharded BAM that holds only alt-contig mapping reads.

Setup

  • Operating system: on GCE via Google Life Sciences API (through Cromwell)
  • DeepVariant version: v1.6
  • Installation method (Docker, built from source, etc.): official v1.6 docker
  • Type of data: Both PacBio HiFi and ONT (10.4), on GRCh38.

Steps to reproduce:

  • Command
/opt/deepvariant/bin/run_deepvariant \
    --model_type=PACBIO \
    --ref=GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \
    --haploid_contigs chrX,chrY \
    --par_regions_bed GRCh38.PAR.bed \
    --reads=/cromwell_root/<sample_id>.alts.bam \
    --output_vcf=/cromwell_root/dv_output/<sample_id>.alts.deepvariant.vcf.gz \
    --output_gvcf=/cromwell_root/dv_output/<sample_id>.alts.deepvariant.g.vcf.gz \
    --num_shards=16
  • Relevant log
    (note it says "0 examples", so I suspect it is when no examples are available, not just when there's only alt-mapping reads, e.g. if one simulates reads error-free from the reference itself, it probably will have the same issue)
/cromwell_root/tmp.cd83af44/tmpuzrx3yrs/make_examples.tfrecord-00011-of-00016.gz.example_info.json
I0203 17:23:03.253894 135328978921280 make_examples_core.py:2958] example_shape = None
I0203 17:23:03.254237 135328978921280 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 7, 9, 10]
I0203 17:23:03.255900 135328978921280 make_examples_core.py:301] Task 11/16: Found 0 candidate variants
I0203 17:23:03.256017 135328978921280 make_examples_core.py:301] Task 11/16: Created 0 examples
I0203 17:23:04.930985 137565708298048 make_examples_core.py:301] Task 7/16: Writing example info to /cromwell_root/tmp.cd83af44/tmpuzrx3yrs/make_examples.tfrecord-00007-of-00016.gz.example_info.json
I0203 17:23:04.931358 137565708298048 make_examples_core.py:2958] example_shape = None
I0203 17:23:04.931699 137565708298048 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 7, 9, 10]
I0203 17:23:04.933463 137565708298048 make_examples_core.py:301] Task 7/16: Found 0 candidate variants
I0203 17:23:04.933572 137565708298048 make_examples_core.py:301] Task 7/16: Created 0 examples
I0203 17:23:09.199501 136895166957376 make_examples_core.py:301] Task 13/16: Writing example info to /cromwell_root/tmp.cd83af44/tmpuzrx3yrs/make_examples.tfrecord-00013-of-00016.gz.example_info.json
I0203 17:23:09.199875 136895166957376 make_examples_core.py:2958] example_shape = None
I0203 17:23:09.200180 136895166957376 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 7, 9, 10]
I0203 17:23:09.201941 136895166957376 make_examples_core.py:301] Task 13/16: Found 0 candidate variants
I0203 17:23:09.202048 136895166957376 make_examples_core.py:301] Task 13/16: Created 0 examples

real 112m20.375s
user 1760m59.767s
sys 11m47.541s

***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "/cromwell_root/tmp.cd83af44/tmpuzrx3yrs/call_variants_output.tfrecord.gz" --examples "/cromwell_root/tmp.cd83af44/tmpuzrx3yrs/[email protected]" --checkpoint "/opt/models/pacbio"

/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

warnings.warn(
I0203 17:23:14.218397 132068663560000 call_variants.py:471] Total 1 writing processes started.
W0203 17:23:14.224790 132068663560000 call_variants.py:482] Unable to read any records from /cromwell_root/tmp.cd83af44/tmpuzrx3yrs/[email protected]. Output will contain zero records.
I0203 17:23:14.225926 132068663560000 call_variants.py:623] Complete: call_variants.

And then the program hangs there for 10+ hours (UTC time when I'm reporting is Feb. 04, 04:05, and the program still appears running).

We've observed this for both ONT and HiFi data on multiple samples, further suggesting this isn't a data issue.

Thanks!
Steve

@SHuang-Broad
Copy link
Author

This is possibly related to #764

@AndrewCarroll
Copy link
Collaborator

Hi @SHuang-Broad

Thank you for the report. We think the two items are linked and we are working on a patch that we plan to release to cover the issue.

@kishwarshafin
Copy link
Collaborator

hi @SHuang-Broad ,

Please try the following docker that has the patch incorporated that should fix your issue:

docker pull google/deepvariant:CL602468145
docker pull google/deepvariant:CL602468145-gpu

@SHuang-Broad
Copy link
Author

Thanks @kishwarshafin . Let me try this out and report back.

@kishwarshafin
Copy link
Collaborator

@SHuang-Broad , Can you please try the v1.6.1 docker? If the issue isn't resolved, please feel free to reopen this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants