You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With release 156, now dbSNP includes rsIDs larger than 2^31 which cannot be properly handled by bcftools anymore:
$ wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.40.gz{,.tbi}
$ tabix GCF_000001405.40.gz NC_000001.11:6259533-6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
[W::vcf_parse_info] Extreme INFO/RS value encountered and set to missing at NC_000001.11:6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=.;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
If HTSlib is compiled with option -DVCF_ALLOW_INT64 then it works fine:
$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
However, this cannot be represented anymore as a binary VCF, which is a huge problem:
$ bcftools view -Ou GCF_000001405.40.gz -r NC_000001.11:6259533-6259533 | bcftools view -H
[E::bcf_write] Data at NC_000001.11:6259533 contains 64-bit values not representable in BCF. Please use VCF instead
[main_vcfview] Error: cannot write to (null)
Is there a discussion in samtools/hts-specs to get the BCF specification to update the specification to 64-bit values?
The text was updated successfully, but these errors were encountered:
Changing BCF specification is not an easy task and may take a long time even if there is a good will to do it.
The problem could be addressed more easily at dbSNP side if the INFO/RS was a string rather than an integer.
I am getting the same error when trying to annotate dbSNP 156. I understand from the discussion that this issue can't be fixed temporarily. But can you help me with compiling HTSlib with option -DVCF_ALLOW_INT64. I did read the documentation and it states that this option needs to be added manually in the makefile. I tried that and it's not working. I made this change in the makefile in the htslib-1.20 folder with bcftools-1.20.
Since I have no experience in developing with C++ and make, could you please specify the exact changes to be made in the makefile?
Is this correct?
CFLAGS = -g -Wall -O2 -fvisibility=hidden -DVCF_ALLOW_INT64=1
Yes, that is correct, one must compile with -DVCF_ALLOW_INT64. Try to force recompilation of vcf.c with touch vcf.c, see what the standard make command line looks like and add -DVCF_ALLOW_INT64. It should be noted that this has not been terribly well tested, hopefully the code did not deteriorate too much.
Perhaps a simpler workaround is to edit the VCF using the reheader command, changing the offending tag to Type=String
bcftools view -h file.vcf.gz > hdr.txt
# edit hdr.txt and change the offending tag to Type=String
reheader -h hdr.txt -o new.bcf file.vcf.gz
With release 156, now dbSNP includes rsIDs larger than 2^31 which cannot be properly handled by
bcftools
anymore:If HTSlib is compiled with option
-DVCF_ALLOW_INT64
then it works fine:However, this cannot be represented anymore as a binary VCF, which is a huge problem:
Is there a discussion in samtools/hts-specs to get the BCF specification to update the specification to 64-bit values?
The text was updated successfully, but these errors were encountered: