-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small (reference) data for testing #104
Comments
Do you think the Klebsiella pneumoniae that is used in the guide is small enough? curl -LJO https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_009025895.1/download\?include_annotation_type\=GENOME_FASTA |
The fasta should be fine. I guess one could even use a subsequence of this genome to reduce runtime and memory requirements of the test. But I was wondering more about the reference data that you have on zenodo (i.e. that is downloaded with |
You could use Within the database directory, the You could create an even smaller database if create a database containing only the markers with hits in the test sequence. |
Wonderful.
Could you tell me where in the output I can find the IDs of the markers for the ids file? |
To get a list of the accessions of markers with hit in the test genome: awk -v FS="\t" 'NR>1 && $9!="NA" {print $9}' genomad_output/GCF_009025895.1_annotate/GCF_009025895.1_genes.tsv | sort -u After you create the sub-database it's not guaranteed that the matches will be the same, as the database size will change significantly. It should work for test purposes anyway. |
Excellent. Got it down to 23MB (as tar.gz) which is still to large for our repo but it will help a lot anyway. I would put this on zenodo or would you be interested in doing it with your account? |
Great! One thing you can do to reduce the size of the database a bit and make the test faster is to reduce the search sensitivity in geNomad (setting I think it's best if you upload it yourself, since you'll be using it. But please share the link once its up! |
Thanks for the help. Here is the link: https://zenodo.org/records/11945948 Galaxy tool wrappers should be finished soon as well: Helmholtz-UFZ/galaxy-tools#29 |
Awesome! Thanks! |
Is there any small reference data set (and fasta) that could be used for testing.
Background: I'm thinking about creating a tool wrapper for Galaxy and those require tests.
The text was updated successfully, but these errors were encountered: