-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CrystalNN gives incorrect result for simple aromatic ring #3762
Comments
Hi @fxcoudert , have you tested some of other algorithms in the same module as well? I fear CrystalNN might be optimized mostly for typical inorganic crystals. Just a reminder: the alogrithm uses Voronoi partitioning, solid angles/distances, electronegativities and so on. I would assume the H atoms are not found, right? |
|
@fxcoudert should we add more documentation to the module and the respective classes to make clear that this is likely only working for inorganic/ionic crystals? If you look at the CrystalNN benchmark, it's also clear that it has never been benchmarked against molecular crystals (see https://pubs.acs.org/doi/10.1021/acs.inorgchem.0c02996) |
Historically, I have had to toggle some of the defaults to get CrystalNN to play nicely with MOF linkers, which I imagine was the source of the motivation for @fxcoudert's question. I can dig up the arguments, if it would be helpful. Of course, one would hope the defaults would suffice... |
Ah, I see! Sorry that I did not know that! At least, I figured out who to ping 😅. |
No I didn't know (or didn't remember) that. I just wanted to play with CrystalNN and see what it gave. I have seen it used on MOFs in a couple of papers, though:
@Andrew-S-Rosen if you know what parameters can give good results for MOFs, I'd be interested, indeed |
After looking again: I find it still puzzling that it does not consider the second C despite the fact that the bond lengths should be extremely similar. |
Ah, I was unclear in my phrasing. I meant that you were probably asking about an organic molecule because you're interested in MOF linkers. :) Edit: I now see you were indeed referring to a MOF! I only saw the image at first. This is what I get for responding to GitHub issues from my phone...
I did some digging and noticed that for the MOF Explorer App on the Materials Project, we used the following: cnn = CrystalNN(porous_adjustment=True, x_diff_weight=1.5, search_cutoff=4.5)
StructureGraph.with_local_env_strategy(structure, cnn).to_json() So, we were playing around with |
This is only tangentially related, but it also implies issues with C and |
I would like to add an additional weird behaviour of one of the other algorithms from the same module which relies on the MinimumDistance. It could be related as many functionalities are shared between the two classes. from pymatgen.core.structure import Structure
from pymatgen.analysis.local_env import JmolNN, CrystalNN, MinimumDistanceNN, VoronoiNN, MinimumOKeeffeNN
from collections import Counter
structure = Structure.from_file("MOF-71.cif")
nn = MinimumDistanceNN(0.5)
bonded_struct = nn.get_bonded_structure(structure)
len_neighbors = []
import numpy as np
for n, site in enumerate(bonded_struct.structure.sites):
connected_sites = len(bonded_struct.get_connected_sites(n))
if np.abs(site.coords[0] - 4.63463478) < 0.01 and np.abs(site.coords[1] - 1.18568268) < 0.01 and np.abs(
site.coords[2] - 8.95930359) < 0.01:
print(connected_sites)
print(bonded_struct.get_connected_sites(n))
len_neighbors.append(connected_sites)
counter = Counter(len_neighbors) I only get the 2nd C if I add a very large tolerance criterion, which is very weird. And the weight is much lower. The weights are different for the different C atoms even so it is a purely distance-based metric. |
I think I have a idea: it's due to the bonded structure. from pymatgen.core.structure import Structure
from pymatgen.analysis.local_env import JmolNN, CrystalNN, MinimumDistanceNN, VoronoiNN, MinimumOKeeffeNN
from collections import Counter
structure = Structure.from_file("MOF-71.cif")
nn =MinimumDistance()
print(nn.get_cn(structure, 76)) As expected, this gives 1. Same for CrystalNN() btw. I am not entirely sure where this comes from but potentially that both algorithms detect a bond from one side but not the other. Thus, changing the cutoffs should likely be enough. (Tested, My hypothesis for now: |
@fxcoudert I hope my answer is clear and helps. I don't think it is a bug. Just suboptimal weights for the C-H vs. C-C bond case. |
I have closed this issue now with an update to the documentation. Let us know if there are still any problems. |
Python version
Python 3.11.6
Pymatgen version
2023.10.4
Operating system version
macOS 14.4.1
Current behavior
Take this CIF file: https://gist.github.com/fxcoudert/f13193d91eacc7e3d67cf355f805ee36
Run this:
The code will output the number of neighbords detected for each atom by CrystalNN. Atoms C6, C7, C12, C13 are detected to have only two neighbours.
This is incorrect, they are carbon atoms in benzene rings (the ones in purple) and should be detected to have 3 neighboring atoms.
Expected Behavior
No carbon atom in this structure should be two-connected.
Minimal example
No response
Relevant files to reproduce this bug
No response
The text was updated successfully, but these errors were encountered: