-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run MMSeqs/FoldSeek with sequences that do not use amino acid sequences #945
Comments
What application do you have in mind? The main thing to do is to build a new substitution matrix. In addition to the substitution matrix, we also need background probabilities for the alphabet and lambda value that was computed on the fly previously, however, we temporarily removed this code as part of the relicensing in release 16. We will return this functionality soonish though. We have a R-script that also kind of does this, but it is not very complete or well tested: The above is only true though if you want an alphabet with at most 20/21 letters. This assumption is pretty baked in into MMseqs2. A larger alphabet would require probably quite a bit of refactoring/cleanup. |
Secondary Structure alphabet from STRIDE. So 8 letters. Do you think this is easily doable? Sorry we didn't get to meet up in Vancouver! |
If you could provide me some instructions on how to implement what is needed I would be happy to help. |
You can use and adapt the following script we made for Foldseek: Additionally, you will need to more letters to the Alphabet so you still have an alphabet with 20 letters, all of these can have the same negative value as the X letter. Also you will currently need to use MMseqs2 release 15 for this, as 16 and 17 temporarily removed support for custom substitution matrices. |
Just like how FoldSeek replaces the 20 amino acids with 20 structure tokens, is there a way to hack MMSeqs/FoldSeek to use a custom vocab?
I want to make a mmseqs database of sequences that use a custom vocab and then search/cluster and make MSAs using the custom vocab.
If you would point me in a direction to do this?
Danny
The text was updated successfully, but these errors were encountered: