-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix problem where tree search is truncated incorrectly. #244
Conversation
@luizirber your thoughts on the use of |
…of tests for pickling
…ug/sbt_similarity
sourmash_lib/search.py
Outdated
query_ksize = query.minhash.ksize | ||
|
||
# calculate the band size/resolution R for the genome | ||
R_metagenome = sourmash_lib.MAX_HASH / float(orig_query.minhash.max_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use get_scaled_for_max_hash
here
Looking good so far! |
This is substantial enough that maybe it can be merged now - it has tests for the critical bug that it fixes, and resolving merge conflicts is getting annoying. @luizirber thoughts? |
Seems like |
whoops! fixed. thx!
|
Huh. That's a weird py2.7 error.... |
@ctb random scipy installation error is gone, but now the test is failing: https://travis-ci.org/dib-lab/sourmash/jobs/292597698#L630 |
I can track down the error, pretty sure it is a division problem with py2... |
Codecov Report
@@ Coverage Diff @@
## master #244 +/- ##
==========================================
+ Coverage 87.08% 87.15% +0.07%
==========================================
Files 13 14 +1
Lines 2029 2087 +58
Branches 36 36
==========================================
+ Hits 1767 1819 +52
- Misses 261 267 +6
Partials 1 1
Continue to review full report at Codecov.
|
Fixes #200, where tree search is truncated on containment measures (number of shared hashes) rather than similarity.
The test in here uses
--best-only
to demonstrate the behavior because it is easy, but the bug will also show up in other searches; this is the line of code insbtmh.py
that needs to be fixed:In this line, we also need to take into the number of mins in the search signature. This requires modification of the data stored at each internal node of the SBT.
Additionally, this PR:
sourmash_lib.search
;--traverse-directory
tosourmash search
andgather
(fixes add a '--traverse-dir' argument to search and gather #257);--best-only
(fixes limit 'search --best-only' output to a single result #281);This PR need some cleanup as well - the
gather
functionality should be revisited before a merge is requested - although it does pass all tests.make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?