You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As seen in #701, the way the MinHash object interface works for protein MinHashes is undocumented and quite possibly just stooopid.
A few issues:
per this comment, MinHash.add_sequence assumes the input is DNA and does a translation of the provided sequence before hashing it, rather than adding the protein k-mers directly. The current "correct" code to use is MinHash.add_protein which assumes the input is amino acids. This should be documented / demonstrated in the API documentation, doc/api-example.md.
related, perhaps add_sequence should be named add_dna_sequence...
separately, per this comment, MinHash.add_protein divides the k-mer size by 3 before generating the k-mers. This is also quite hidden behavior and at the least needs to be documented. Perhaps we should fix this somehow.
As seen in #701, the way the MinHash object interface works for protein MinHashes is undocumented and quite possibly just stooopid.
A few issues:
MinHash.add_sequence
assumes the input is DNA and does a translation of the provided sequence before hashing it, rather than adding the protein k-mers directly. The current "correct" code to use isMinHash.add_protein
which assumes the input is amino acids. This should be documented / demonstrated in the API documentation,doc/api-example.md
.add_sequence
should be namedadd_dna_sequence
...MinHash.add_protein
divides the k-mer size by 3 before generating the k-mers. This is also quite hidden behavior and at the least needs to be documented. Perhaps we should fix this somehow.add_sequence
doesn't complain when it's given a bunch of non-ACTG characters. That should also be looked into.The text was updated successfully, but these errors were encountered: