With SMI2GCS you can generate atomic descriptors from SMILES. The atomic descriptors are based on convolutions of CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB).
More information about the method is available in the RegioML paper. Including: 1, 2, 3, 4, and 5.
We recommend using anaconda to install the Python 3 environment:
conda env create -f environment.yml && conda activate smi2gcs
Then download the binaries of xtb version 6.4.0:
mkdir dep; cd dep; wget https://github.com/grimme-lab/xtb/releases/download/v6.4.0/xtb-210201.tar.xz; tar -xvf ./xtb-210201.tar.xz; cd ..
Sort each shell according to a modified version of the Cahn-Ingold-Prelog (CIP) priority rules and the CM5 charges if CIP is unambiguous:
- Sort according to atomic number in descending order.
- If (1) is not unique, for each atom with the same priority (A*):
- Go to bound and yet not included atoms and sum up atomic numbers. Set the priority of A* according to the sum of the atomic numbers.
- If (2i) did not give an unambiguous result expand the shell of each atom A* by one bond.
- Repeat (2ii) until a unique order is found.
- If no unique order is found in (2) and all bound atoms are included, then sort atoms according to the CM5 charges in descending order.
@article{Ree2022,
title = {RegioML: predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning},
volume = {1},
ISSN = {2635-098X},
url = {http://dx.doi.org/10.1039/D1DD00032B},
DOI = {10.1039/d1dd00032b},
number = {2},
journal = {Digital Discovery},
publisher = {Royal Society of Chemistry (RSC)},
author = {Nicolai Ree and Andreas H. G\"{o}ller and Jan H. Jensen},
year = {2022},
pages = {108–114}
}