Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative values in NM field #130

Closed
sergiigladchuk opened this issue Aug 23, 2018 · 3 comments
Closed

Negative values in NM field #130

sergiigladchuk opened this issue Aug 23, 2018 · 3 comments

Comments

@sergiigladchuk
Copy link

Good day,
I have noticed that except issue with DP being less then VD (which leads to strange AF calculation) that I hope is going to be fixed soon, NM (Mean mismatches in reads) field is getting negative values. For our RNA-seq data we have about 2% of variants with NM being negative. Please fix the issue, since we would like to use this field for false positive filtering.
Another observation is that very few negative NM fields are observed in variants produced from DNA-seq data, so I would say that this issue is RNA-seq relevant only, and might be connected to #66

Here is small test case, which produces 14 variants with negative NM out of 35 variants called from .bam file attached.

Command used:

$ vardict-java -G /scratch/sergii/add_files/hg38.fa -f 0.02 -c 1 -S 2 -E 3 -g 4 -Q 10 -r 2 -q 20 -b xxxxx.bam -th 8 xxxxx-RNA_Err.bed | VarDict/teststrandbias.R | VarDict/var2vcf_valid.pl -A -N xxxxx -f 0.02 | bgzip -c > xxxxx.vcf.gz

Examples of variants with negative NM:

chr1	35738058	.	TTTC	T	53	PASS	SAMPLE=xxxxx;TYPE=Deletion;DP=6;END=35738061;VD=3;AF=0.5;BIAS=2:2;REFBIAS=1:2;VARBIAS=2:1;PMEAN=13.3;PSTD=1;QUAL=33.8;QSTD=1;SBF=1;ODDRATIO=3.1057829678862;MQ=60;SN=6;HIAF=0.6000;ADJAF=0.3333;SHIFT3=12;MSI=5;MSILEN=3;NM=-3.0;HICNT=3;HICOV=5;LSEQ=AAACTGACTGTCCTCCCCAA;RSEQ=TTCTTCTTCTTCAGCTGTAA	GT:DP:VD:AD:AF:RD:ALD	1/0:6:3:3,3:0.5:1,2:2,1
chr1	120069350	.	G	C	36	PASS	SAMPLE=xxxxx;TYPE=SNV;DP=57;END=120069350;VD=2;AF=0.0351;BIAS=2:0;REFBIAS=38:17;VARBIAS=2:0;PMEAN=24.5;PSTD=1;QUAL=36;QSTD=1;SBF=1;ODDRATIO=0;MQ=60;SN=4;HIAF=0.0408;ADJAF=0;SHIFT3=3;MSI=2;MSILEN=2;NM=-1.5;HICNT=2;HICOV=49;LSEQ=TCACCATGCGCGGGGGCCGC;RSEQ=CAGCACAGCCAGAGCGCCAG	GT:DP:VD:AD:AF:RD:ALD	0/1:57:2:55,2:0.0351:38,17:2,0
chr2	130372355	.	AGACGGG	A	201	PASS	SAMPLE=xxxxx;TYPE=Deletion;DP=154;END=130372361;VD=74;AF=0.4805;BIAS=2:2;REFBIAS=21:59;VARBIAS=20:54;PMEAN=26.8;PSTD=1;QUAL=32.4;QSTD=1;SBF=1;ODDRATIO=1.04031209362809;MQ=60;SN=23.667;HIAF=0.4830;ADJAF=0.0844;SHIFT3=19;MSI=4;MSILEN=1;NM=-5.4;HICNT=71;HICOV=147;LSEQ=GGGCGCCGGGAGTGGGACGC;RSEQ=GACGGGGACGGGGACGGGGGGT:DP:VD:AD:AF:RD:ALD	0/1:154:74:80,74:0.4805:21,59:20,54

test_case.zip

@PolinaBevad
Copy link
Contributor

@sergiigladchuk , thank you for reporting the issue!

I reproduced the situation on your data and at first glance, it happens when there are indels in the cigar, but NM = 0 in the read. Mismatch value for variant is calculated now as: mismatch = NM - I - D (lh3/minimap2#25), and it will be negative in this case.

Are you sure that NM tag value is correct in BAM file? I tried to validate it with Picard ValidateSamFile and it doesn't show any errors. But when I use Picard SetNmMdAndUqTags, that can fix NM value on the specific reference, NM fields in the reads that cause this error were changed to the bigger values.

We will also look closely at this situation and analyze whether this is an issue.

@sergiigladchuk
Copy link
Author

Thank you @PolinaBevad , for clarification.
I have created related issue for HISAT2, which was used for alignment - Wrong calculation for NM tag. I guess there is/was unclear definition of NM tag, if it should take into account entire gap length or just its opening. I hope that some agreement was reached, and one logic for its calculations will be used for all related tools.

@PolinaBevad
Copy link
Contributor

@sergiigladchuk, hello!

We added adjusting of the NM field to zero in case of such errors in the VarDictJava and Perl versions to avoid negative values in output if they appear due to alignment problems. Release 1.5.7. (https://github.com/AstraZeneca-NGS/VarDictJava/releases/tag/v1.5.7) contains this fix.

I will close this issue, please, re-open it if you will have another question on this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants