You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a list of things that could be improved, for whenever we next revise the format. (So we don't forget any). I'm not suggesting an immediate update, but to gather ideas in one place.
Magic number with version string.
Add number of reads / bases as columns. This will make very approximate coverage plots trivial as well as improve tools like samtools idxstats so they work on both BAM and CRAM. What else in idxstats needs replicating?
A generation UUID. If coupled with an identical UUID in the SAM header then we can use this to spot cases where the CRAM file has been updated without rebuilding the index. (We want to add this same feature to .BAI and .CSI too.)
Check the utility of container size column. I think currently it is the number of remaining bytes after decoding the container header (and perhaps compression header?). More useful for random slicing would simply by the size of the entire container.
Consider whether gzipped text is the right format. We could provide for random access on compressed index by self-indexing the index, but that's a far larger change.
The text was updated successfully, but these errors were encountered:
For completeness sake, so we don't forget at least, also consider adding the "missing" meta-information to CRAM indices. Re: pysam-developers/pysam#556.
I say "missing" because at the time of writing CRAM those extra fields in BAI were non-standard and undocumented anyway.
This is just a list of things that could be improved, for whenever we next revise the format. (So we don't forget any). I'm not suggesting an immediate update, but to gather ideas in one place.
The text was updated successfully, but these errors were encountered: