Release v0.4
We introduced two major updates in v0.4.
- Expansion of bacteria genomes from RefSeq (~180K) to NCBI (~1.19M). We encourage the users to update to the larger database as it further improves the polishing accuracy of both Homoolish and Modpolish since more related strains can be found from the 1.19 million genomes. As the sketch size increased from 720Mb to 3.3Gb, more RAM would be needed. Note that the v0.4 source code is tightly bounded with the larger database. Please don't use the old version of code (v0.3.x or earlier) with the larger database. If you wanted to stay with the earlier version, please use the old bacteria sketch instead.
- Introduction of Modpolish. Modpolish aims to correct mismatch errors due to novel modifications untrained in the ONT basecalling models. We observed some bacteria produced an unexpectedly large amount of mismatch errors due to novel modifications, which were not corrected by Medaka nor by Homopolish. Below are polishing results from two datasets, where one contains 12 in-house Listeria strains and the other is the Zymo Microbial Community.
Mismatches of 12 Listeria strains