You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the issue is just that each pair of SRR1262629 doesn't have enough coverage to saturate the genome (If I'm looking at the right stats, each should be <2x). When compared against a set with more saturation, there would be more overlap, apparently enough to outweigh genomic variants.
Hi
Following on from #32 where I (accidentally) sketched both reads of a pair, I get these curious results:
Mash sketch/dist sample against itself:
%> mash dist SRR1262629.mash.msh SRR1262629.mash.msh
SRR1262629_1.t.fastq.gz SRR1262629_1.t.fastq.gz 0 0 1000/1000
SRR1262629_2.t.fastq.gz SRR1262629_1.t.fastq.gz 0.0234513 0 440/1000
SRR1262629_1.t.fastq.gz SRR1262629_2.t.fastq.gz 0.0234513 0 440/1000
SRR1262629_2.t.fastq.gz SRR1262629_2.t.fastq.gz 0 0 1000/1000
Mash sketch/dist two samples against each other:
%> mash dist SRR1262629.mash.msh SRR1262625.mash.msh
SRR1262629_1.t.fastq.gz SRR1262625_1.t.fastq.gz 0.0166117 0 545/1000
SRR1262629_2.t.fastq.gz SRR1262625_1.t.fastq.gz 0.0189925 0 505/1000
SRR1262629_1.t.fastq.gz SRR1262625_2.t.fastq.gz 0.0174175 0 531/1000
SRR1262629_2.t.fastq.gz SRR1262625_2.t.fastq.gz 0.0212923 0 470/1000
So when looking at read 1 and read 2 from the same sample (SRR1262629) they only share 440/1000 hashes.
When comparing different samples (SRR1262629 vs SRR1262625) they actually share more hashes! (470 - 545)
Is this expected??
Cheers
Mick
The text was updated successfully, but these errors were encountered: