Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple RDKitConverter #2775

Merged
merged 107 commits into from
Aug 21, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
90fccce
simple converter
Jun 19, 2020
ba0f89e
add PDB residue info and guessers
Jun 22, 2020
b54ec60
fix 2 letter atom names and fix bond order
Jun 24, 2020
ac38ceb
test mol2 topology and pdb info
Jun 24, 2020
de5560b
autopep8
Jun 25, 2020
b00dee3
fix code review
Jun 25, 2020
495217d
store other attributes in each atom
Jun 25, 2020
eb5d365
more tests on topology + pep8
Jun 25, 2020
1774963
fix other props type
Jun 26, 2020
81eea81
bond order and charges
Jun 26, 2020
6d6a65b
fix indentation and bonds with atoms outside of atomgroup
Jun 26, 2020
cf0706c
fix test
Jun 26, 2020
d41fa5b
fix test minimal build
Jun 29, 2020
fe27f0b
add icodes to MonomerInfo
Jun 29, 2020
3590113
better attributes conversion
Jun 29, 2020
2d79014
test mda to rdkit MonomerInfo conversion
Jun 29, 2020
df660b7
Merge branch 'rdkit-converter' into rdkit-sanitize
Jun 29, 2020
68dc7ca
test mda to rdkit MonomerInfo conversion
Jun 29, 2020
f57e9e6
docs
Jun 29, 2020
99da261
fix minimal deps test
Jun 29, 2020
18a05ab
fix for minimal deps tests 🙏
Jun 30, 2020
1c14478
use util functions
Jun 30, 2020
5493fd1
Merge branch 'develop' into rdkit-converter
Jun 30, 2020
f5c798a
move min deps test up
Jun 30, 2020
558f7cd
capitalize elements by default
Jun 30, 2020
984f507
store index in MDA object
Jul 1, 2020
e582e63
fetch singular from _TOPOLOGY_ATTRS
Jul 1, 2020
931a81b
updated changelog
Jul 2, 2020
c432ea9
use ag intersection for bonds instead of try except
Jul 3, 2020
b0c39eb
more tests + pep8
Jul 3, 2020
7b002f1
Merge branch 'develop' into rdkit-converter
Jul 3, 2020
dc0c276
fix test for numpy 1.13.3
Jul 6, 2020
6b8cd9d
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Jul 6, 2020
0fd38b0
docs + pep8
Jul 7, 2020
25ee227
fix too long atom names in PDB output
Jul 7, 2020
cfce278
table of supported attributes for the converter
Jul 7, 2020
3717f57
changelog convention
Jul 7, 2020
499010a
catch no bonds
Jul 7, 2020
9604b6a
use match in pytest.raises and pytest.warns
Jul 7, 2020
30871ab
remove bondtype attribute
Jul 8, 2020
9c0a42e
fix bond attributes
Jul 8, 2020
66d644d
Merge branch 'rdkit-sanitize' into rdkit-converter
Jul 8, 2020
1bbfb09
simplified the code for infering
Jul 8, 2020
4a79cfd
include negative charges
Jul 9, 2020
bd88908
skip failing test for now
Jul 10, 2020
d9e088e
new test
Jul 10, 2020
f6aa736
update doc
Jul 10, 2020
b943af1
standardize groups
Jul 13, 2020
0105419
fix tests
Jul 13, 2020
7168e2d
add previously failing tests
Jul 13, 2020
de324fd
Merge branch 'develop' into rdkit-converter
Jul 13, 2020
d1f06f3
simpler pattern standardization
Jul 14, 2020
5a661f8
more tests on inferring and patterns
Jul 14, 2020
6e9c221
test atom properties and reaction mapping
Jul 14, 2020
b02fef6
fix docs
Jul 14, 2020
67dc9c2
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Jul 14, 2020
c8acda0
fix minimal build tests
Jul 15, 2020
9defa40
pep8
Jul 15, 2020
39ae42b
bfactors as TempFactor
Jul 15, 2020
ff45d3b
Update CHANGELOG
Jul 15, 2020
7d7f3f2
pep8
Jul 16, 2020
9d895bc
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Jul 16, 2020
f3f6f01
pep8
Jul 16, 2020
5d342db
no hydrogen warning
Jul 16, 2020
eff29a2
quickfix for polycyclic conjugated systems
Jul 17, 2020
5a37d84
Merge branch 'develop' into rdkit-converter
Jul 17, 2020
ca07796
save any atom property instead of just the ones tagged with "_MDAnaly…
Jul 20, 2020
28d04ed
general solution
Jul 21, 2020
ae73aff
back to old method for conjugated systems
Jul 21, 2020
8eaebbd
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Jul 21, 2020
425369e
Merge branch 'develop' into rdkit-converter
Jul 22, 2020
d6cd44d
added "RDKit" in changelog
Jul 22, 2020
1398f8c
fix when first atom is charged
Jul 22, 2020
e71a686
working general solution to conjugated systems
Jul 22, 2020
925bd7f
pep8
Jul 22, 2020
5a24ff6
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Jul 22, 2020
e5282f3
for loop instead of while
Jul 22, 2020
b7ac69e
bugfix for conjugated system with N at one edge
Jul 23, 2020
aba4b1b
fix typos
Jul 27, 2020
49f3bff
fix test for min deps
Jul 27, 2020
70cce99
use indirect parametrization
Jul 27, 2020
e82cf0a
add coordinates + tests + fix index property
Jul 29, 2020
3427c3c
cache the molecule and only update conformers when iterating trajectory
Jul 29, 2020
5218daa
update docs
Jul 30, 2020
3313b56
add kwargs to the cache key
Jul 30, 2020
a72a51b
use fstrings
Jul 31, 2020
76102fb
Merge branch 'develop' into rdkit-converter
orbeckst Aug 7, 2020
b9a25b2
Merge branch 'develop' into rdkit-converter
Aug 10, 2020
cec1f68
cache kwarg + fix cache tests
Aug 10, 2020
75f6262
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Aug 10, 2020
6c4c123
fix code review + improve docs
Aug 13, 2020
cdb60b2
improve doc
Aug 13, 2020
ebd0416
add test nan in coordinates
Aug 13, 2020
a1aab5b
warn nan coordinates
Aug 13, 2020
dcb8f33
end of code review
Aug 13, 2020
47b2755
corrections to smarts patterns for reactions
Aug 14, 2020
89efcd3
sort atoms by NUE + more standardization tests
Aug 14, 2020
a595e69
pdb atom names formating
Aug 14, 2020
f394e72
test RDKit->MDA->RDKit->MDA equal
Aug 14, 2020
e525e41
tackle conjugated systems ending with O-
Aug 14, 2020
9e22681
docs + pep8
Aug 14, 2020
eaf6887
tests
Aug 14, 2020
ded60c0
add max_iter kwarg
Aug 14, 2020
c18c7cd
Merge branch 'develop' into rdkit-converter
Aug 14, 2020
4ed0483
reorder docstring standardize_patterns
Aug 14, 2020
d288e87
Merge branch 'rdkit-converter' of https://github.com/cbouy/mdanalysis…
Aug 14, 2020
4e49616
fix doc build
Aug 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 145 additions & 2 deletions package/MDAnalysis/coordinates/RDKit.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,45 @@
"""

import warnings
import re

import numpy as np

from ..exceptions import NoDataError
from ..topology.guessers import guess_atom_element
from . import memory

from . import base

try:
from rdkit import Chem
except ImportError:
pass
else:
RDBONDTYPE = {
'AROMATIC': Chem.BondType.AROMATIC,
'SINGLE': Chem.BondType.SINGLE,
'DOUBLE': Chem.BondType.DOUBLE,
'TRIPLE': Chem.BondType.TRIPLE,
}
RDBONDORDER = {
1: Chem.BondType.SINGLE,
1.5: Chem.BondType.AROMATIC,
"ar": Chem.BondType.AROMATIC,
2: Chem.BondType.DOUBLE,
3: Chem.BondType.TRIPLE,
}
# add string version of the key for each bond
RDBONDORDER.update({str(key):value for key,value in RDBONDORDER.items()})
RDATTRIBUTES = {
"altLoc": "AltLoc",
"chainID": "ChainId",
"name": "Name",
"occupancy": "Occupancy",
"resname": "ResidueName",
"resid": "ResidueNumber",
"segid": "SegmentNumber",
"tempfactor": "TempFactor",
}

cbouy marked this conversation as resolved.
Show resolved Hide resolved
class RDKitReader(memory.MemoryReader):
"""Coordinate reader for RDKit.
Expand Down Expand Up @@ -101,4 +135,113 @@ def __init__(self, filename, **kwargs):
warnings.warn("No coordinates found in the RDKit molecule")
coordinates = np.empty((1,n_atoms,3), dtype=np.float32)
coordinates[:] = np.nan
super(RDKitReader, self).__init__(coordinates, order='fac', **kwargs)
super(RDKitReader, self).__init__(coordinates, order='fac', **kwargs)


class RDKitConverter(base.ConverterBase):
"""Convert MDAnalysis AtomGroup or Universe to `RDKit <https://www.rdkit.org/docs/source/rdkit.Chem.rdchem.html#rdkit.Chem.rdchem.Mol>`_ :class:`rdkit.Chem.rdchem.Mol`.

Example
-------

.. code-block:: python

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB_full
u = mda.Universe(PDB_full)
mol = u.select_atoms('resname DMS').convert_to('RDKIT')


.. versionadded:: 2.X.X
cbouy marked this conversation as resolved.
Show resolved Hide resolved
"""

lib = 'RDKIT'
units = {'time': None, 'length': 'Angstrom'}

def convert(self, obj):
"""Write selection at current trajectory frame to :class:`~rdkit.Chem.rdchem.Mol`.

Parameters
-----------
obj : AtomGroup or Universe
"""
try:
from rdkit import Chem
except ImportError:
raise ImportError('RDKit is required for RDKitConverter but '
cbouy marked this conversation as resolved.
Show resolved Hide resolved
'is not installed. Try installing it with \n'
'conda install -c conda-forge rdkit')
try:
# make sure to use atoms (Issue 46)
ag = obj.atoms
except AttributeError as e:
raise TypeError("No `atoms` attribute in object of type {}, "
"please use a valid AtomGroup or Universe".format(
type(obj))) from e
cbouy marked this conversation as resolved.
Show resolved Hide resolved

mol = Chem.RWMol()
atom_mapper = {}

for atom in ag:
try:
element = atom.element
except NoDataError:
# guess atom element
# capitalize: transform CL to Cl and so on
element = guess_atom_element(atom.name).capitalize()
cbouy marked this conversation as resolved.
Show resolved Hide resolved
rdatom = Chem.Atom(element)
# add properties
mi = Chem.AtomPDBResidueInfo()
for attr, rdattr in RDATTRIBUTES.items():
try: # get value in MDA atom
cbouy marked this conversation as resolved.
Show resolved Hide resolved
value = getattr(atom, attr)
except AttributeError:
pass
else:
if isinstance(value, np.generic):
# convert numpy types to python standard types
value = value.item()
if attr == "segid":
cbouy marked this conversation as resolved.
Show resolved Hide resolved
# RDKit needs segid to be an int
try:
value = int(value)
except ValueError:
# convert any string to int
# can be mapped back with np.base_repr(x, 36)
value = int(value, 36)
elif attr == "name":
# RDKit needs the name to be properly formated for a
# PDB file (1 letter elements start at col 14)
name = re.findall('(\D+|\d+)', value)
if len(name) == 2:
symbol, number = name
else:
symbol, number = name[0], ""
value = "{:>2}".format(symbol) + "{:<2}".format(number)
# set attribute value in RDKit MonomerInfo
getattr(mi, "Set%s" % rdattr)(value)
rdatom.SetMonomerInfo(mi)
# TODO other properties (charges)
index = mol.AddAtom(rdatom)
# map index in universe to index in mol
cbouy marked this conversation as resolved.
Show resolved Hide resolved
atom_mapper[atom.ix] = index

try:
bonds = ag.bonds
except NoDataError:
cbouy marked this conversation as resolved.
Show resolved Hide resolved
ag.guess_bonds()
IAlibay marked this conversation as resolved.
Show resolved Hide resolved
bonds = ag.bonds

for bond in bonds:
bond_indices = [atom_mapper[i] for i in bond.indices]
try:
bond_type = bond.type.upper()
except AttributeError:
# bond type can be a tuple for PDB files
bond_type = None
bond_type = RDBONDTYPE.get(bond_type, RDBONDORDER.get(
bond.order, Chem.BondType.SINGLE))
mol.AddBond(*bond_indices, bond_type)

Chem.SanitizeMol(mol)
return mol
40 changes: 39 additions & 1 deletion testsuite/MDAnalysisTests/coordinates/test_rdkit.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from numpy.testing import (assert_equal,
assert_almost_equal)

from MDAnalysisTests.datafiles import mol2_molecule
from MDAnalysisTests.datafiles import mol2_molecule, PDB_full

Chem = pytest.importorskip("rdkit.Chem")
AllChem = pytest.importorskip("rdkit.Chem.AllChem")
Expand Down Expand Up @@ -76,3 +76,41 @@ def test_compare_mol2reader(self):
assert_equal(universe.trajectory.ts.positions,
mol2.trajectory.ts.positions)


class TestRDKitConverter(object):
@pytest.fixture
def pdb(self):
return mda.Universe(PDB_full)

@pytest.fixture
def mol2(self):
return mda.Universe(mol2_molecule)

@pytest.mark.parametrize("sel_str", [
"resid 1",
"resname LYS and name NZ",
"resid 34 and altloc B",
])
def test_monomer_info(self, pdb, sel_str):
rdmol = Chem.MolFromPDBFile(PDB_full)
sel = pdb.select_atoms(sel_str)
umol = sel.convert_to("RDKIT")
atom = umol.GetAtomWithIdx(0)
mi = atom.GetMonomerInfo()

for mda_attr, rd_attr in mda.coordinates.RDKit.RDATTRIBUTES.items():
if mda_attr == "occupancy":
mda_attr = "occupancie"
rd_value = getattr(mi, "Get%s" % rd_attr)()
mda_value = getattr(sel, "%ss" % mda_attr)[0]
if mda_attr == "name":
rd_value = rd_value.strip()
elif mda_attr == "segid":
rd_value = np.base_repr(rd_value, 36)
assert rd_value == mda_value

def test_identical_topology_mol2(self, mol2):
# no chirality check
rdmol = mol2_mol()
umol = mol2.atoms.convert_to("RDKIT")
assert rdmol.HasSubstructMatch(umol) and umol.HasSubstructMatch(rdmol)