Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BQ to VCF] order INFO fields as they are listed in the header #452

Open
mbookman opened this issue Mar 9, 2019 · 0 comments
Open

[BQ to VCF] order INFO fields as they are listed in the header #452

mbookman opened this issue Mar 9, 2019 · 0 comments
Assignees

Comments

@mbookman
Copy link
Contributor

mbookman commented Mar 9, 2019

This does not appear to be a requirement from the VCF spec, but may be worthwhile to at least follow as a convention.

Looking at the output of a joint genotype file from the Broad institute, the INFO field keys are sorted alphabetically in the header and in the actual values. For example:

the first few ##INFO directives from the header:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
...

and an example value:

AC=4;AF=4.630e-03;AN=864;DP=2456;ExcessHet=0.0001;FS=0.000;InbreedingCoeff=0.0339;MLEAC=11;MLEAF=0.013;MQ=36.12;QD=27.40;SOR=4.977;VQSLOD=-5.582e+01;culprit=SOR;CSQ=T|intergenic_variant|MODIFIER||||||||||||||||1||||SNV||||||||||||||||||||||||||||||||||||||||||

I ran an export using bq_to_vcf which passed the original header as a representative header file and the same INFO field value looks like:

MLEAC=11;AC=4;FS=0.0;MLEAF=0.013;InbreedingCoeff=0.0339;culprit=SOR;ExcessHet=0.0001;VQSLOD=-55.82;AF=0.00463;AN=864;SOR=4.977;MQ=36.12;QD=27.4;CSQ=T|intergenic_variant|MODIFIER||||||||||||||||1||||SNV||||||||||||||||||||||||||||||||||||||||||;DP=2456

There are a few small differences in numeric formatting (but otherwise values are the same).
The INFO field ordering appears arbitrary.

@allieychen allieychen self-assigned this Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants