Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added kraken 1.1.1 to roary 3.13.0 dockerfile #260

Merged
merged 1 commit into from
Jan 21, 2022
Merged

Conversation

kapsakcj
Copy link
Collaborator

@kapsakcj kapsakcj commented Oct 5, 2021

To address #254

Docker image is available temporarily on dockerhub under my personal repo kapsakcj/roary:3.13.0-plus-kraken

https://hub.docker.com/r/kapsakcj/roary/tags?page=1&ordering=last_updated

@erinyoung Can you try it out and let me know how it goes? Would appreciate a quick test (I don't have any data handy to do this), especially run via singularity.

One thing to note is that I'm trying something new. I've set LC_ALL=C.UTF-8 instead of the usual LC_ALL=C. I've read that this will prevent headaches down the road. Something to do with encoding or something....

If you have perl issues, we can revert back to LC_ALL=C before merging this PR and rebuilding the staphb/roary:3.13.0 docker image

@erinyoung
Copy link
Contributor

I'm testing now. Will let you know if I run into issues.

@erinyoung
Copy link
Contributor

Roary worked okay, and kraken didn't throw any errors per se, but it also didn't work.

It throws an error while trying to run kraken :

kraken-report: unable to find minikraken2_v2_8GB_201904_UPDATE in $KRAKEN_DB_PATH (undefined)

Even if I define KRAKEN_DB_PATH in my script or in the container, it still throws this error.

I am unsure as to why this is, so I'll be looking into it next week-ish.

@kapsakcj
Copy link
Collaborator Author

kapsakcj commented Oct 8, 2021

Are you perhaps using a kraken2 database ? If so, I don't think that will be compatible with kraken v1

@erinyoung
Copy link
Contributor

I want to look into that.

I only got as far as trying kraken's 'minikraken_20141208' database with the same results. For whatever reason, the environmental variable KRAKEN_DB_PATH is never found.

I want to make sure that kraken itself works in the container - outside of roary.

@k-florek k-florek requested a review from erinyoung December 17, 2021 19:47
@k-florek k-florek self-assigned this Dec 17, 2021
@kapsakcj
Copy link
Collaborator Author

OK I was able to test out this container with 3 salmonella Hadar assemblies and got the optional kraken feature to run successfully.

It seems that roary is picky about the path to the kraken1 database and requires an absolute path to the directory containing the file database.kdb. If you use a relative path, roary will not be able to locate the database.kdb file

The following command worked. Fed in 3 .gff files into Roary
roary -p 4 -qc -k ${PWD}/minikraken_20171013_4GB -f roary-kraken-container-test-take2 -e -n -v prokka-GCA_01*/*.gff

Relevant bits of STDOUT:

2022/01/16 17:49:21 Running Kraken on each input assembly
2022/01/16 17:49:21 sed -n '/##FASTA/,//p' /data/prokka-GCA_011245895.1/GCA_011245895.1.gff | grep -v '##FASTA' > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011245895.1.fna
2022/01/16 17:49:21 sed -n '/##FASTA/,//p' /data/prokka-GCA_011348195.1/GCA_011348195.1.gff | grep -v '##FASTA' > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011348195.1.fna
2022/01/16 17:49:21 sed -n '/##FASTA/,//p' /data/prokka-GCA_015698805.1/GCA_015698805.1.gff | grep -v '##FASTA' > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_015698805.1.fna
2022/01/16 17:49:21 parallel --gnu -j 4 < /tmp/OErLwtcnI9
2022/01/16 17:49:21 kraken --fasta-input  --preload  --db /data/minikraken_20171013_4GB --output /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011245895.1.kraken /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011245895.1.fna  > /dev/null 2>&1
2022/01/16 17:49:21 kraken --fasta-input  --preload  --db /data/minikraken_20171013_4GB --output /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011348195.1.kraken /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011348195.1.fna  > /dev/null 2>&1
2022/01/16 17:49:21 kraken --fasta-input  --preload  --db /data/minikraken_20171013_4GB --output /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_015698805.1.kraken /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_015698805.1.fna  > /dev/null 2>&1
2022/01/16 17:49:21 parallel --gnu -j 4 < /tmp/o3Vul9C0G_
2022/01/16 17:49:48 kraken-report --db /data/minikraken_20171013_4GB /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011245895.1.kraken > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011245895.1.kraken.report
2022/01/16 17:49:48 kraken-report --db /data/minikraken_20171013_4GB /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011348195.1.kraken > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_011348195.1.kraken.report
2022/01/16 17:49:48 kraken-report --db /data/minikraken_20171013_4GB /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_015698805.1.kraken > /data/roary-kraken-container-test-take2_1642355359/EhYR7nu1Z9/GCA_015698805.1.kraken.report

Output kraken QC file

$ cat roary-kraken-container-test-take2_1642355359/qc_report.csv
Sample,Genus,Species
GCA_011245895.1,Salmonella,Salmonella enterica
GCA_011348195.1,Salmonella,Salmonella enterica
GCA_015698805.1,Salmonella,Salmonella enterica

The above command produced all output roary files as expected

@erinyoung
Copy link
Contributor

I had some time today, and I can confirm that it works with kraken databases. My kraken database had been corrupted when I tested it prior. My apologies.

First, I pulled the container with singularity

singularity pull --name roary_kraken docker://kapsakcj/roary:3.13.0-plus-kraken

Then I ran it

singularity exec --bind /Volumes/IDGenomics_NAS/Data/kraken1/minikraken_20171013_4GB:/kraken_db --bind /Volumes/IDGenomics_NAS/ARLN/wastewater/Citrobacter/gff/test:/data roary_kraken roary -p 30 -qc -k /kraken_db -f /data/roary_testing  -e -n -v /data/F16.gff  /data/F18.gff  /data/F20.gff  /data/F22.gff  /data/F36.gff  /data/F42.gff  /data/F52.gff  /data/F54.gff /data/F55.gff  /data/F59.gff

And it worked!

$ cat /Volumes/IDGenomics_NAS/ARLN/wastewater/Citrobacter/gff/test/roary_testing/qc_report.csv 
Sample,Genus,Species
F16,Citrobacter,Citrobacter freundii
F18,Citrobacter,Citrobacter braakii
F20,Citrobacter,Citrobacter farmeri
F22,Citrobacter,Citrobacter freundii
F36,Citrobacter,Citrobacter freundii
F42,Citrobacter,Citrobacter freundii
F52,Citrobacter,Citrobacter freundii
F54,Citrobacter,Citrobacter braakii
F55,Citrobacter,Citrobacter farmeri
F59,Citrobacter,Citrobacter freundii

I do have a concern, however, in that I used to use it with kraken2 databases, and I'm not sure how to add in that functionality. There doesn't seem to be any documentation about that, however.

Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me as well

@erinyoung erinyoung added the enhancement New feature or request label Jan 21, 2022
@kapsakcj
Copy link
Collaborator Author

Glad to hear it ran successfully!

I do have a concern, however, in that I used to use it with kraken2 databases, and I'm not sure how to add in that functionality. There doesn't seem to be any documentation about that, however.

Huh. I thought Roary pre-dated kraken2, so I didn't think there was support for it. Does that required kraken2 to be installed or can you use kraken1 with a kraken2 database?

I was going to merge this PR and build on dockerhub since it's running properly with kraken1, but we can wait if you'd like to try it out with a kraken2 database. Let me know what you'd prefer!

@erinyoung
Copy link
Contributor

I think it's fine to merge it. I am confused as well. My notes have roary using the kraken2 database, but I can't seem to replicate those. I'm willing to move on with my life.

@erinyoung erinyoung mentioned this pull request Jan 21, 2022
@kapsakcj
Copy link
Collaborator Author

I'm willing to move on with my life.

agreed 😆

I'll merge and re-build the docker image on dhub and quay. Planning on re-building the staphb/roary:3.13.0 and staphb/roary:latest images

Thanks for testing and I hope this is useful for you and others!

@kapsakcj kapsakcj merged commit 375ba42 into master Jan 21, 2022
@kapsakcj kapsakcj deleted the cjk-roary-kraken branch January 21, 2022 16:54
SarahNadeau pushed a commit to SarahNadeau/docker-builds that referenced this pull request Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants