-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourmash.exceptions.Io: Invalid checksum #3233
Comments
aieee that's not good! can you share the shell command(s) you are running, please? thank you! |
This comment was marked as outdated.
This comment was marked as outdated.
ok, thanks! nothing wrong with the commands, AFAICT! I'm wondering if maybe the downloads are corrupt? I ran
can you verify those results? Your md5sum should be the same. (You can also try running |
My md5sum for fungi and bacteria do not match yours:
Running your other commands only for these two i get:
|
but your md5sum for protozoa match? Weird... This is all consistent with your .zip files having some small amount of corruption; I would suggest just re-downloading them and trying again. Sorry for the hassle! Two other notes: first, we are about to release a new updated GenBank! I'll try to remember to mention it here when we do! second, we have much faster versions of gather now available in the branchwater plugin. You'll need to install with something like then, nhe revised command that should work:
if you get a chance to try it out (maybe on a small .zip file first?) and it doesn't work, please let me know! |
oh! sorry, no, you can only currently run |
I'll be glad to download a new updated GenBank when it is available ! I've tested your fastgather command with the GTDB database, and I should've found it sooner. It is much faster (at least 10x). If you can provide suggestions for applying it to multiple .zip files, I would greatly appreciate it. |
Update: the new download of the bacteria file also fails. The md5sum I got this time was: 16b027bd1d3f934e4f1b84769f3d6a59 Not sure if it helps, but is the message when using this .zip with fastgather identifying the potentially corrupted signatures? (but I might not be interpreting this correctly)
|
yikes, I don't know what to do about the downloads! This is extremely strange, I've never in my life had repeated problems with downloads 😓 The errors above are just the typical errors of "wow this file is corrupted, I don't know what to do about it." This is very likely to be connected to either your Internet connection or the computer to which you are downloading it. Three thoughts -
If there is a public FTP site where I can drop a file, or you want me to upload it via dropbox or box, lmk. I think I can do the latter? Not sure how they like such big files these days. The last option I can think of is sending a USB stick - drop me an e-mail at [email protected] with a shipping address, and I'll see what I can do. I'd probably wait until the new genbank is out tho. |
That did it! I'd tried downloading with wget and by clicking the link, but didn't remember to use curl... Now, the files have finished downloading with the correct md5sum! Thank you for your help, and I'm sorry for taking your time with this trivial issue. I will be marking this as closed. I'll be sure to keep an eye out for the release of the updated GenBank and will read more about fastgather and how I could apply it to multiple .zip files. |
No worries, glad it wasn't our server! I'll post more answers in a separate issue soon! But the short version with fastgather is you have two options - First,
Second/alternative,
There's a few annoying mechanical steps in there that I need to work through. Will not take me very long. |
posted a tutorial here: #3239 and I learned something new - that I could use ask questions as you have them! |
Hi!
I am sequencing environmental and animal host samples to identify potential cause of infection and I intended to test sourmash.
Most times I am unabel to assemble the non-host reads, thus I intend to classify my long reads.
I downloaded the genbank databases from here and this is what I tried running:
But i get:
I've managed to get the results if using the gtdb-rs214 database available, but really wish to sort this for genbank.
Running the command separately for the databases, i get the error for protozoa, fungi and bacteria.
Sorry for not getting this at once, pressed enter twice before :/
The text was updated successfully, but these errors were encountered: