You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! The current version of the library provides the get_alphabet_from_selfies function that generates a list of symbols from an iterable. However, if one uses the returned alphabet to generate an encoding (as seen below), the script would raise an error that there is no symbol "." in the alphabet.
In the context of training a machine learning model, would this symbol provide vital information? If so, wouldn't it be better to preserve it in the alphabet?
The text was updated successfully, but these errors were encountered:
Hi @vandrw -- Thanks for your interest. The dot is used to represent two unconnected molecules, or unphysical bonds that cannot be represented in a native way within SMILES (Ferrocenes). For generative models, we didnt encounter any usecase of it yet. We only introduced it to be able to read even some il-formed SMILES. Hope this helps!
Hi! The current version of the library provides the
get_alphabet_from_selfies
function that generates a list of symbols from an iterable. However, if one uses the returned alphabet to generate an encoding (as seen below), the script would raise an error that there is no symbol "." in the alphabet.After having a closer look at the functions involved, there are two possible solutions:
selfies/selfies/utils/selfies_utils.py
Line 71 in 120b776
char_list.remove(".")
):selfies/selfies/utils/encoding_utils.py
Line 50 in 120b776
In the context of training a machine learning model, would this symbol provide vital information? If so, wouldn't it be better to preserve it in the alphabet?
The text was updated successfully, but these errors were encountered: