-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent calling results after rerun with the same set of bam files and parameters #621
Comments
Does that mean that gridss can produce different results on a sample rerun and has stochastic properties? |
It appears that there are different IDs in the output. Look at this: BEID=asm170-4188,asm172-5972 (first run) vs BEID=asm170-4188,asm172-6016 (second run) seems to suggest that the same samples were not run |
I have reproduced this behaviour, running the same files 4 times gives different qualities for same variants. Also variant coordinates can change by 1-2bp. |
thanks for testing, that sounds very worrying indeed! |
It looks like there's a difference of 6 split read that were assembled in the first but not second run. This seems to indicate that #450 did not fully fix the issue of multi-threaded determinism. Can you confirm:
This indicates that the assembly sub-process (asm172) processing the RHS had output a different number of assemblies by the time it got to chr10:43111683. That is, the root cause could be any difference in the 10Mb leading up the chr10:43111683. That said, if the inputs are identical then the output should be identical which clearly is not the case here. Note that 'identical' inputs from the assembly sub-process is not necessarily the same as identical inputs to gridss overall. The GRIDSS pre-processing does a (bwa) realignment of soft-clipped reads (required to support for bowtie2 inputs, optional for bwa-aligned inputs). The input split reads/discordant read pairs to the asm/variant calling steps are the reads in the input.bam.gridss.working/input.bam.sv.bam files, not the input files given on the command line. Any non-determinism here will propagate downstream to asm & calling. |
I have the same problem. Here are the results of two runs on the same files:
Assemblies that differ between runs:
|
Hi!
Having one normal and two tumor bam files I got different calling results for a clinically important gene fusion KIF5B-RET.
These are records of two breakpoints, called first time from the specified set of bam files:
And these are records of the same two breakpoints, called second time from the same set of bam files:
QUAL, several INFO and FORMAT fields got different values (more noticable for the second tumor sample), after reruning gridss with exactly the same set of input bams and parameters. Especially concerning is the difference in read counts supporting varaint allele (VF), which are different and became considerably lower after second rerun (was 11 became 8).
Could you, please, let me know, is there any chance to avoide that sort of stochasticity, or, if this behaviour is an inevitable, give a piece of advice on how to cope with that?
This is an inportant isssue, since it may affect somatic filters and and increase the number of false negative variants.
The text was updated successfully, but these errors were encountered: