-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5SUM from picard and samtools do not match #1814
Comments
Some extra info. It seems the problem is that the genome file has different line breaks. After fixing them with But shouldn't |
This definitely seems like a bug we should look into. |
After looking at the relevant code, we believe that this issue only affects files with multi-byte line endings (eg., Windows line endings). |
@fgvieira I am having trouble reproducing this behavior. Are you able to share the fasta file with which you saw this issue? |
It was a while ago so don't think I still have it, but you should be able to just create a multi-seq fasta file on Windows. |
I can't repro this either - AFAICT CreateSequenceDictionary respects the line endings correctly (I've included my tests in a PR here). Since I've already done the work, we might as well keep them. |
Have been trying to find the original file, but to no avail. |
Instructions
_
) as appropriate;Bug Report
Affected tool(s)
CreateSequenceDictionary
Affected version(s)
Description
When creating a dictionary with
samtools dict
, I get:Also if it is from the gzipped genome:
If I use
picard CreateSequenceDictionary
on the gzipped genome, I get the same:But with the plain genome, the md5sum is different:
If I calculate the md5sum manually for (e.g.)
scaffold01
, I getf6412f880b27671e3789d5836f5803f1
. It seems that there is something wrong withpicard
with plain genomes.And both genome files are equal:
Expected behavior
I'd expect all
dicts
to be identical.Actual behavior
The md5sums are different.
The text was updated successfully, but these errors were encountered: