Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and/or improve protein MinHash API. #720

Open
ctb opened this issue Aug 24, 2019 · 2 comments
Open

Document and/or improve protein MinHash API. #720

ctb opened this issue Aug 24, 2019 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Aug 24, 2019

As seen in #701, the way the MinHash object interface works for protein MinHashes is undocumented and quite possibly just stooopid.

A few issues:

  • per this comment, MinHash.add_sequence assumes the input is DNA and does a translation of the provided sequence before hashing it, rather than adding the protein k-mers directly. The current "correct" code to use is MinHash.add_protein which assumes the input is amino acids. This should be documented / demonstrated in the API documentation, doc/api-example.md.
  • related, perhaps add_sequence should be named add_dna_sequence...
  • separately, per this comment, MinHash.add_protein divides the k-mer size by 3 before generating the k-mers. This is also quite hidden behavior and at the least needs to be documented. Perhaps we should fix this somehow.
  • in the provided example in Setting "scaled=1" doesn't retrieve all protein k-mers #701, I'm not sure why add_sequence doesn't complain when it's given a bunch of non-ACTG characters. That should also be looked into.
@ctb
Copy link
Contributor Author

ctb commented Feb 4, 2021

ref #885 maybe

@ctb
Copy link
Contributor Author

ctb commented May 15, 2021

Bunch of stuff in this comment to pay attention to - #999 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant