Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrystalNN gives incorrect result for simple aromatic ring #3762

Closed
fxcoudert opened this issue Apr 16, 2024 · 13 comments · Fixed by #3764
Closed

CrystalNN gives incorrect result for simple aromatic ring #3762

fxcoudert opened this issue Apr 16, 2024 · 13 comments · Fixed by #3764
Labels

Comments

@fxcoudert
Copy link

Python version

Python 3.11.6

Pymatgen version

2023.10.4

Operating system version

macOS 14.4.1

Current behavior

  1. Take this CIF file: https://gist.github.com/fxcoudert/f13193d91eacc7e3d67cf355f805ee36

  2. Run this:

structure = Structure.from_file("/Users/fx/Documents/structures/MOFs/MOF-71.cif")
nn = CrystalNN()
bonded_struct = nn.get_bonded_structure(structure)
for n, site in enumerate(bonded_struct.structure.sites):
    print(site.label, len(bonded_struct.get_connected_sites(n)))
  1. The code will output the number of neighbords detected for each atom by CrystalNN. Atoms C6, C7, C12, C13 are detected to have only two neighbours.

  2. This is incorrect, they are carbon atoms in benzene rings (the ones in purple) and should be detected to have 3 neighboring atoms.

Capture d’écran 2024-04-16 à 22 05 46

Expected Behavior

No carbon atom in this structure should be two-connected.

Minimal example

No response

Relevant files to reproduce this bug

No response

@fxcoudert fxcoudert added the bug label Apr 16, 2024
@JaGeo
Copy link
Member

JaGeo commented Apr 16, 2024

Hi @fxcoudert ,

have you tested some of other algorithms in the same module as well? I fear CrystalNN might be optimized mostly for typical inorganic crystals.

Just a reminder: the alogrithm uses Voronoi partitioning, solid angles/distances, electronegativities and so on. I would assume the H atoms are not found, right?

@JaGeo
Copy link
Member

JaGeo commented Apr 16, 2024

JMolNN() seems to work at least for the C atoms mentioned above in that example. However, I would not trust the radii in there (they are horribly far away from the atomic radii in P, for example). You might want to provide a list of radii and tolerances yourself (e.g., atomic radii). I would assume that it should work very efficiently for molecular crystals. I am not sure there is currently any implementation directly in pymatgen or related package optimized for molecular crystals. (cc @Andrew-S-Rosen ?)

@JaGeo
Copy link
Member

JaGeo commented Apr 16, 2024

@fxcoudert should we add more documentation to the module and the respective classes to make clear that this is likely only working for inorganic/ionic crystals? If you look at the CrystalNN benchmark, it's also clear that it has never been benchmarked against molecular crystals (see https://pubs.acs.org/doi/10.1021/acs.inorgchem.0c02996)

@Andrew-S-Rosen
Copy link
Member

Historically, I have had to toggle some of the defaults to get CrystalNN to play nicely with MOF linkers, which I imagine was the source of the motivation for @fxcoudert's question. I can dig up the arguments, if it would be helpful. Of course, one would hope the defaults would suffice...

@JaGeo
Copy link
Member

JaGeo commented Apr 16, 2024

Historically, I have had to toggle some of the defaults to get CrystalNN to play nicely with MOF linkers, which I imagine was the source of the motivation for @fxcoudert's question. I can dig up the arguments, if it would be helpful. Of course, one would hope the defaults would suffice...

Ah, I see! Sorry that I did not know that! At least, I figured out who to ping 😅.

@fxcoudert
Copy link
Author

Historically, I have had to toggle some of the defaults to get CrystalNN to play nicely with MOF linkers, which I imagine was the source of the motivation for fxcoudert's question

No I didn't know (or didn't remember) that. I just wanted to play with CrystalNN and see what it gave. I have seen it used on MOFs in a couple of papers, though:

@Andrew-S-Rosen if you know what parameters can give good results for MOFs, I'd be interested, indeed

@JaGeo
Copy link
Member

JaGeo commented Apr 16, 2024

After looking again: I find it still puzzling that it does not consider the second C despite the fact that the bond lengths should be extremely similar.

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Apr 16, 2024

No I didn't know (or didn't remember) that. I just wanted to play with CrystalNN and see what it gave.

Ah, I was unclear in my phrasing. I meant that you were probably asking about an organic molecule because you're interested in MOF linkers. :) Edit: I now see you were indeed referring to a MOF! I only saw the image at first. This is what I get for responding to GitHub issues from my phone...

if you know what parameters can give good results for MOFs, I'd be interested, indeed

I did some digging and noticed that for the MOF Explorer App on the Materials Project, we used the following:

cnn = CrystalNN(porous_adjustment=True, x_diff_weight=1.5, search_cutoff=4.5)
StructureGraph.with_local_env_strategy(structure, cnn).to_json()

So, we were playing around with x_diff_weight and search_cutoff to address some issues with the organics. That said, the behavior reported in this issue is indeed odd. I would not rule out a bug or problem in general here, although I personally do not have time to look into the issue further. Hopefully someone else can!

@Andrew-S-Rosen
Copy link
Member

This is only tangentially related, but it also implies issues with C and CrystalNN (but this time on an inorganic surface): #2618

@JaGeo
Copy link
Member

JaGeo commented Apr 17, 2024

I would like to add an additional weird behaviour of one of the other algorithms from the same module which relies on the MinimumDistance. It could be related as many functionalities are shared between the two classes.

from pymatgen.core.structure import Structure
from pymatgen.analysis.local_env import JmolNN, CrystalNN, MinimumDistanceNN, VoronoiNN, MinimumOKeeffeNN
from collections import Counter

structure = Structure.from_file("MOF-71.cif")

nn = MinimumDistanceNN(0.5)
bonded_struct = nn.get_bonded_structure(structure)
len_neighbors = []
import numpy as np

for n, site in enumerate(bonded_struct.structure.sites):
    connected_sites = len(bonded_struct.get_connected_sites(n))
    if np.abs(site.coords[0] - 4.63463478) < 0.01 and np.abs(site.coords[1] - 1.18568268) < 0.01 and np.abs(
            site.coords[2] - 8.95930359) < 0.01:
        print(connected_sites)
        print(bonded_struct.get_connected_sites(n))
    len_neighbors.append(connected_sites)
counter = Counter(len_neighbors)

I only get the 2nd C if I add a very large tolerance criterion, which is very weird. And the weight is much lower. The weights are different for the different C atoms even so it is a purely distance-based metric.

@JaGeo
Copy link
Member

JaGeo commented Apr 17, 2024

I think I have a idea: it's due to the bonded structure.

from pymatgen.core.structure import Structure
from pymatgen.analysis.local_env import JmolNN, CrystalNN, MinimumDistanceNN, VoronoiNN, MinimumOKeeffeNN
from collections import Counter

structure = Structure.from_file("MOF-71.cif")

nn =MinimumDistance()
print(nn.get_cn(structure, 76))

As expected, this gives 1. Same for CrystalNN() btw.

I am not entirely sure where this comes from but potentially that both algorithms detect a bond from one side but not the other. Thus, changing the cutoffs should likely be enough. (Tested, nn =CrystalNN( x_diff_weight=1.5, search_cutoff=4.5) and it seems to work for this example as well)

My hypothesis for now:
If you use the CN tools, you get 1. Thus, some cutoff cuts off the C-C bonds for this particular C. One of the neighboring Cs has a bond to it... (the one bonded to 3Cs) but not the other (2Cs and 1 H) Then, in the graph, you will get a 2-fold coordination.... The graph should probably be a directed one but the output doesn't seem to care.

@JaGeo
Copy link
Member

JaGeo commented Apr 17, 2024

@fxcoudert I hope my answer is clear and helps. I don't think it is a bug. Just suboptimal weights for the C-H vs. C-C bond case.

@JaGeo
Copy link
Member

JaGeo commented Apr 18, 2024

I have closed this issue now with an update to the documentation. Let us know if there are still any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants