-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: Enable merged sigs, sequence range selection in urlsketch
#161
Conversation
curious - what's the use case for ranges? |
Some of the virus genomes in the ICTV VMR are provided as ranges of larger genomes. example: The Fels2 sequence is within the here is the online feature description for Fels2 in that genome:
The reference range on the genbank link doesn't exactly match the range given in the ICTV VMR, but I assume the VMR has potentially tuned their range for a reason. |
cool, thx! |
urlsketch
urlsketch
@ctb - not sure if you want to look at this again, a lot of testing + a few bugfixes happened since you reviewed |
all good :) |
Adds the following to
urlsketch
:url
column, download each url and sketch together in a single file. If saving the fasta files, write all data to the singledownload_filename
provided in the input csv.range
is applied to all contigs/reads. This matches behavior ofSeqKit subseq
. The range will be applied both for sketching and for saving the downloaded file(s).To generate the FASTA range test files:
seqkit subseq --region 1:50000 GCA_000175535.1_ASM17553v1_genomic.fna -o GCA_000175535.1_ASM17553v1_genomic.1-50000.fna
seqkit subseq --region 50000:100000 GCA_000175535.1_ASM17553v1_genomic.fna -o GCA_000175535.1_ASM17553v1_genomic.50000-100000.fna
to do:
urlsketch
--keep-fasta
for merged--keep-fasta
and a directory, we can check and/or build that directory, but if the filename has a path specified that doesn't exist in that folder, we will currently get an error. Fix by building out the path as needed; add test