-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple RDKitConverter #2775
Simple RDKitConverter #2775
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2775 +/- ##
===========================================
+ Coverage 92.93% 92.96% +0.03%
===========================================
Files 187 187
Lines 24507 24813 +306
Branches 3185 3234 +49
===========================================
+ Hits 22775 23067 +292
- Misses 1686 1698 +12
- Partials 46 48 +2
Continue to review full report at Codecov.
|
@cbouy since this seemd like WIP, I've been holding off on reviewing, do ping us when you'd like a review. |
@IAlibay @richardjgowers @fiona-naughton It's still missing quite some tests but I guess you can start having a look now |
Oooo that's some really cool stuff! Do make sure to include this (or a similar image) in your next blog :) |
This looks really cool! =) Would it be easier to call the conversion method This is just a quick thought I had looking at your code snippet, feel free to ignore! EDIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On an initial read it looks good, here are a few initial comments & thoughts.
The cis/trans is rotating, I’m guessing this is because we’re not labelling
the bond? Iirc rdkit labels stereo around bonds as a group of atoms and a
stereo type right? This will probably be too tricky to quickly solve but
is worth raising warnings and an issue on the tracker about
Looks awesome nonetheless!
…On Wed, Jun 24, 2020 at 19:51, Irfan Alibay ***@***.***> wrote:
It's starting to look good!
Oooo that's some really cool stuff! Do make sure to include this (or a
similar image) in your next blog :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGB26KWCBXCLFVQXXAVDRYJDKHANCNFSM4OC3WW7A>
.
|
I decided to follow what was already there for the parmed converter, but yes I completely agree that a |
Yes, although atom and bond stereochemistry can be inferred from the coordinates with |
I wasn't aware of the syntax for the |
pinging @lilyminium who will know better here, but I don't think there's a specific convention yet. Unless we have multiple converters already using the same naming style, I can't see a reason why we can't try to agree on a good naming scheme now. @cbouy it might be a good idea to bring this up an issue on this. As we have more and more "converters" it would be good to have a unified API. |
I like the current model of the metaclass registering the format. We could do a similar thing to TopologyAttrs though, and get the metaclass to add methods to AtomGroups. e.g. def __init__(cls, name, bases, classdict):
method_name = f'to_{cls.lib.lower()}'
setattr(AtomGroup, method_name, cls.convert)
setattr(Universe, method_name, cls.convert) The to_lib() method would show up in dir() etc and I think a fix to docstrings in sphinx is in the works. This would require ParmEdWriter to change too, though, so it's probably best as a separate issue @cbouy |
Currently I'm automatically guessing the |
So my worry here is that there are many ways in which elements can be guessed and hopefully in the future we'll want to provide guessers from things like mass, etc... instead of just atom names. Accounting for all of this should be outside the scope of the method in my opinion. Indeed, users not being aware of the API, and especially the assumptions our guessers make, might lead to a lot more unnoticed errors that will strongly impede scientific reproducibility. For elements, I think it might be a better call to instead give an error message that points users to the right part of the documentation / user guide (I recently wrote an addition to the guessers docstring to show how one could guess elements [and the dangers of doing so], I could probably port that to the user guide). Bonds are a different issue in my opinion, guessing should be rather consistent, but we should warn users that we are doing a guess. @fiona-naughton, @richardjgowers might have different opinions here. I'm also going to ping @jbarnoud and @lilyminium who have been very involved in the element guessing discussions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one comment re: ordering of the docstring (plus maybe an extra line to clarify), otherwise assuming tests pass lgtm!
@richardjgowers and @fiona-naughton, ideally we should aim to get this merged soon. Could you have a look/final approval before then? Depending on how things go I'll squash merge on Monday evening.
Just restarted the azure-pipelines CI, should be fixed now. Could you look into this & build things locally to see if there's any further issues? |
@IAlibay should be good now 🤞 |
Looks good to me! |
So somehow I managed to mess up the squash-merge on this, which has led to a couple of things:
|
Were all of @cbouy 's previous merges squashed, i.e., would it be safe to remove the small commits visible in the history (eg Aug 14 in https://github.com/MDAnalysis/mdanalysis/commits/develop ) ? Answer: I think yes, because these are the same commits as the ones in this PR. It seems that GitHub double-merged this PR, once as squash+merge, the other as full merge. |
It does seem to be the case, although how this can be possible I have to admit goes way beyond my understanding of how git works. |
I think the easiest fix will be the following: First
This will reset develop. It seems that 6440873 is actually a merge and not a squash, so not a double merge/squash. We the need to work with the cbouy:rdkit-converter (or simply the diff 50cd6e7:6440873eeefa12078d07fc1eed61e2201df63841 ). I can then manually squash the diff. @IAlibay could you please write the log message and post it here? Sorry to make you do the work again. |
To recover this branch I do (after git pull and before resetting)
Once I have your commit message I will amend the commit message and make sure that @cbouy is the author. Then I can merge it into develop and push. Then I'll restore branch protection. Sounds ok? Going for lunch now and continue in ~1.5h... |
A slightly abbreviated version of the log message I originally wrote: Towards #2468 (RDKit interoperability GSOC project)
## Overview of work done in PR
This PR adds the `RDKitConverter` class which converts MDAnalysis
`Universe`/`Atomgroup` objects to `RDKit rdchem.Mol` objects.
## Limitations
- Bonds and elements must be present (the former will be inferred if not
present).
- This converter mainly aims at supporting cases where explicit hydrogens
are present.
## Extra implementation details
- Bond order and formal charges are inferred via atomic valencies & the
number of unpaired electrons (see `_infer_bo_and_charges`).
- In part due to the influence of atom read order on inferring, bond
conjugation and functional groups are standardized using SMARTS
reactions (see `_standardize_patterns`).
## Other changes
Also includes some PEP8 changes and some cleaning up of the tests for the
`RDKITReader`.
## References
For more information on the development process for this PR, see:
1) https://cbouy.github.io/blog/2020/07/01/rdkit-converter
2) https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2 |
... it's not easy to rebase anything with lots of merges ... it certainly does not "just do it". @MDAnalysis/coredevs I am going to reset develop now,
which will remove this PR but will still show this PR a merged. I am working on getting the code in this PR into a compact, mergeable state (i.e., not 100 commits). If anyone has any good ideas I am all ears. |
I was discussing this with a friend and I think we came up with a simple-ish way of doing this:
I think that might work (at least the git log seems reasonable?) although I'm not sure I properly managed to transfer the author. Edit: steps 1 & 2 made sense before the history re-write, I guess 4-6 might do the trick though. |
I just rewrote the develop history and removed all signs of PR #2775 |
I'll try I'll try your idea. |
With the clean develop
Looking good:
Check with
Then commit with @IAlibay 's log message and @cbouy as the author and create PR #2916
Please review PR #2916. I wanted to make sure that CI is still ok. |
Towards #2468 (RDKit interoperability GSOC project) PR #2916 -> Originally PR #2775 ##Overview of work done in PR This PR adds the RDKitConverter class which converts MDAnalysis Universe/Atomgroup objects to RDKit rdchem.Mol objects. ##Limitations - Bonds and elements must be present (the former will be inferred if not present). - This converter mainly aims at supporting cases where explicit hydrogens are present. ##Extra implementation details - Bond order and formal charges are inferred via atomic valencies & the number of unpaired electrons (see _infer_bo_and_charges). - In part due to the influence of atom read order on inferring, bond conjugation and functional groups are standardized using SMARTS reactions (see _standardize_patterns). ##Other changes Also includes some PEP8 changes and some cleaning up of the tests for the RDKITReader. ##References For more information on the development process for this PR, see: https://cbouy.github.io/blog/2020/07/01/rdkit-converter https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
Part of the fixes for #2468
Changes made in this Pull Request:
Inspired from the ParmEdConverter
PR Checklist