Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph method for PDHMM event groups that unifies finding/merging and overlap/mutual exclusion #8366

Merged
merged 1 commit into from
Jun 27, 2023

Conversation

davidbenjamin
Copy link
Contributor

@davidbenjamin davidbenjamin commented Jun 15, 2023

@jamesemery The fun begins. No change in output yet, but a non-trivial change in implementation.

Instead of making preliminary event groups according to overlap, then merging them according to mutually excluded events, this PR does it all in one step while automatically handling transitivity by treating as a matter of finding connected components of a graph whose vertices are events and whose edges are reasons (overlap and mutex) for events to be in the same event group.

All the failures are from WDL tests. I assume those are not related.

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #8366 (d374080) into master (a86d8a3) will decrease coverage by 0.021%.
The diff coverage is 70.742%.

❗ Current head d374080 differs from pull request most recent head 62166d4. Consider uploading reports for the commit 62166d4 to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##              master     #8366       +/-   ##
===============================================
- Coverage     86.314%   86.293%   -0.021%     
- Complexity     39474     39480        +6     
===============================================
  Files           2358      2358               
  Lines         185234    185226        -8     
  Branches       20323     20339       +16     
===============================================
- Hits          159882    159837       -45     
- Misses         18196     18223       +27     
- Partials        7156      7166       +10     
Impacted Files Coverage Δ
...walkers/haplotypecaller/HaplotypeCallerEngine.java 67.045% <0.000%> (ø)
...tools/walkers/haplotypecaller/ramps/RampUtils.java 2.198% <0.000%> (ø)
...ava/org/broadinstitute/hellbender/utils/Utils.java 80.639% <33.333%> (ø)
...pecaller/readthreading/ReadThreadingAssembler.java 58.481% <41.667%> (-4.244%) ⬇️
...alkers/haplotypecaller/AlleleLikelihoodWriter.java 81.429% <66.667%> (ø)
...PartiallyDeterminedHaplotypeComputationEngine.java 76.123% <73.869%> (-3.587%) ⬇️
...institute/hellbender/utils/haplotype/EventMap.java 73.636% <100.000%> (ø)
...nstitute/hellbender/utils/haplotype/Haplotype.java 91.346% <100.000%> (ø)
.../utils/haplotype/PartiallyDeterminedHaplotype.java 88.136% <100.000%> (ø)
.../hellbender/utils/haplotype/HaplotypeUnitTest.java 97.561% <100.000%> (ø)

... and 2 files with indirect coverage changes

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. It certainly reads better... the jury is still out on wether the graph overhead and the graph connectivity inspector is necessarily the most efficient approach here when there are a lot of events... I doubt this is going to be that severe cost however. I also think this is going to be much easier to extend for the jointdetection code than the current factoring which is good...

I would consider running a much larger section (like a chromosome) to inspect for mismatches with the old code... i'm not sure i'm convinced that the integration is sufficient to catch edge cases with this code

@davidbenjamin davidbenjamin force-pushed the db_dragen_event_groups branch from d374080 to 62166d4 Compare June 25, 2023 22:52
@davidbenjamin
Copy link
Contributor Author

@jamesemery Ready for re-review.

I ran this branch against master in --dragen-378-concordance-mode on a PCR plus NA12878 sample over all of chromosome 20. The output was completely identical and runtime appeared unaffected. If anything I think this branch had a tendency to be faster on all of the longest-running of the 100-way scattered jobs.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for double checking the outputs on a larger section. 👍

@davidbenjamin davidbenjamin merged commit 0cb0861 into master Jun 27, 2023
@davidbenjamin davidbenjamin deleted the db_dragen_event_groups branch June 27, 2023 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants