Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqkit grep without converting to unique patterns #427

Closed
2 tasks done
alvanuffelen opened this issue Dec 11, 2023 · 1 comment
Closed
2 tasks done

seqkit grep without converting to unique patterns #427

alvanuffelen opened this issue Dec 11, 2023 · 1 comment

Comments

@alvanuffelen
Copy link

Prerequisites

  • make sure you're are using the latest version by seqkit version
  • read the usage

Describe your issue

I have a FASTQ file from which I would like to subsample very specific sequences by ID. Additionally, some sequences should be subsampled multiple times. However, using seqkit grep with --pattern-file, it extracts each pattern only once.

seqkit grep -f id_list.txt mock.fq
[INFO] 3 patterns loaded from file

The file contains 4 patterns (2 unique and 1 duplicate). Would it be possible to add a parameter such that the patterns are not converted to unique patterns?

In contrast, seqtk does not only extract unique IDs:
seqtk subseq mock.fq id_list.txt
All 4 patterns are used, so the output contains 4 sequences.

mock.fq

@seq1
GATCGATCGA
+
IIIIIIIIII
@seq2
AGCTAGCTAG
+
IIIIIIIIII
@seq3
TACGTACGTA
+
IIIIIIIIII
@seq4
CGATCGATCG
+
IIIIIIIIII
@seq5
ATCGATCGAT
+
IIIIIIIIII
@seq6
GCTAGCTAGC
+
IIIIIIIIII
@seq7
CATGCATGCA
+
IIIIIIIIII
@seq8
TGCATGCATG
+
IIIIIIIIII
@seq9
AGCTAGCTAG
+
IIIIIIIIII
@seq10
ATCGATCGAT
+
IIIIIIIIII

id_list.txt

seq1
seq1
seq2
seq3
@shenwei356
Copy link
Owner

Added

  -D, --allow-duplicated-patterns   output records multiple times when duplicated patterns are given

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants