Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired-data more distant than different samples #33

Closed
mw55309 opened this issue Jul 26, 2016 · 1 comment
Closed

Paired-data more distant than different samples #33

mw55309 opened this issue Jul 26, 2016 · 1 comment

Comments

@mw55309
Copy link

mw55309 commented Jul 26, 2016

Hi

Following on from #32 where I (accidentally) sketched both reads of a pair, I get these curious results:

Mash sketch/dist sample against itself:

%> mash dist SRR1262629.mash.msh SRR1262629.mash.msh
SRR1262629_1.t.fastq.gz SRR1262629_1.t.fastq.gz 0 0 1000/1000
SRR1262629_2.t.fastq.gz SRR1262629_1.t.fastq.gz 0.0234513 0 440/1000
SRR1262629_1.t.fastq.gz SRR1262629_2.t.fastq.gz 0.0234513 0 440/1000
SRR1262629_2.t.fastq.gz SRR1262629_2.t.fastq.gz 0 0 1000/1000

Mash sketch/dist two samples against each other:

%> mash dist SRR1262629.mash.msh SRR1262625.mash.msh
SRR1262629_1.t.fastq.gz SRR1262625_1.t.fastq.gz 0.0166117 0 545/1000
SRR1262629_2.t.fastq.gz SRR1262625_1.t.fastq.gz 0.0189925 0 505/1000
SRR1262629_1.t.fastq.gz SRR1262625_2.t.fastq.gz 0.0174175 0 531/1000
SRR1262629_2.t.fastq.gz SRR1262625_2.t.fastq.gz 0.0212923 0 470/1000

So when looking at read 1 and read 2 from the same sample (SRR1262629) they only share 440/1000 hashes.

When comparing different samples (SRR1262629 vs SRR1262625) they actually share more hashes! (470 - 545)

Is this expected??

Cheers
Mick

@ondovb
Copy link
Member

ondovb commented Jul 26, 2016

I think the issue is just that each pair of SRR1262629 doesn't have enough coverage to saturate the genome (If I'm looking at the right stats, each should be <2x). When compared against a set with more saturation, there would be more overlap, apparently enough to outweigh genomic variants.

@ondovb ondovb closed this as completed Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants