-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seqkit common/seqkit grep #416
Comments
Thanks for reporting this. The help message below needs to be updated as the search mechanism of
Updated:
For the information below, the first number indicates the number of signatures. In the case of searching by sequences, they are hash values of both positive and negative strands. I shall make it clearer.
|
This is really quick and helpful response. Thanks a lot for your clarification. Your tool has been amazing and very useful! |
…corrected numbers in the log. #416
The number is fixed.
|
Prerequisites
Describe your issue
Hi,
I have been trying to find a tool to compare reads from fastq files of different files to look for identical reads to see if there is any indication of cross-contaminations. I was trying to use seqkit common and seqkit grep.
This is the output of seqkit common,
I am a bit confused with this line - "[INFO] 7830 unique sequences found in 2 files, which belong to 3915 records in the first file: S1_R1_uniq.fq.gz". Does it mean that there are 3915 common sequences shared by two fastq files?
I have also tried to use seqkit grep like this -
seqkit grep -s -f <(seqkit seq -s S2_R1_uniq.fq.gz) S1_R1_uniq.fq.gz > S2_S1_seqkit_grep.fastq
This process seems to take longer than seqkit common in my case. (number of reads in fastq files ~250-380k).
The text was updated successfully, but these errors were encountered: