Skip to content

Commit

Permalink
[WIP] Fix multi-solvent MPcule building (#664)
Browse files Browse the repository at this point in the history
* Make bonding not mandatory

* Fix bug for case where no tasks are valid

* Small type change

* Disallow certain task types in Q-Chem builder by default

* Add graph hashes to molecule document format

* Change SMD parameters for G2

* change id format

* Fixed bugs with graph hashes; added new ID class (needs to propagate through molecule world)

* Working hard at getting new ID format working

* Tests now pass on all builders; just need to make sure all corresponding tests pass in emmet-core (including remaking JSON files for summary)

* Updated tests; everything (I think) passes; black

* Now everything actually passes

* Remove unused variable

* mypy changes

* ID didn't match - trying without?

* Separate level of theory from solvent information; get rid of solvent names and instead create utility function to define a solvent string

* Create composite lot_solvent concept (capturing what we used to do with LevelOfTheory with pre-defined solvents)

* Updates to molecule doc (change to ID not done yet); now need to propagate to property documents

* Core should be updated post-solvent LOT changes (surprisingly easy).

Reminder: need to regenerate "builder_XXX_set.json" test files once builders are all done

* Need to regenerate test files

* Update thermo

* Bundling networkx 2.8.5 networkx.algorithms.graph_hashing.py source code to ensure long-term fixity of graph hashes.

* Tests

* Add species as top-level attribute

* Small change in filter_task_type; now arbitrary sorting functions can actually be used

* Changes to ID helper function, molecule grouper helper function, and small associated changes to molecule association builder

* MoleculeBuilder now updated to group based on the solvent used to optimize a molecule. Evaluation of different MoleculeDocs is based on the level of theory used to optimize as well as the overall "best lot"

* Small tweak

* Mostly black-ing, but also working on adding more detail about property docs

* Update all property docs (not builders yet) except Redox (which will be tricker)

* Expanded list of metals; small changes to redox builder

* Bugfixes to utils

* Update PartialChargesBuilder to allow, for a given molecule, different docs in different charges

* PartialSpinsBuilder updated

* Updated BondingBuilder; also made check for NBO7 more general across all instances

* Updated OrbitalBuilder

* Updated VibrationBuilder

* Progress on thermo builder; black in builders

* Still not done with ThermoBuilder, but need to change tack and go back to reaction land for now

* New thermo builder, allowing for single-point energy corrections, is finally done

* Some fixes for bugs found by @samblau

* Safety on BondingDoc creation to ensure that some method is always used

* RedoxDoc rewrite done; now need to update builder

* Some progress on RedoxBuilder; this one is kinda hairy, once again

* Draft of RedoxBuilder done

* Emergency backup commit

* Think this should update SummaryDoc constructor?

* Ready to start testing; that was less painful than I was expecting

* Minor fixes to core test based on changes to MPculeID format

* Small fixes

* Fixed ThermoBuilder (specifically dealing with edge case of single atoms)

* Fixed tricky bug in RedoxBuilder caused by use of defaultdict

* SummaryBuilder works; now just need to (re)write tests for all builders

* Weird bug: MongoDB really doesn't like doc keys to have periods in them

* Patched test for molecule builders; next need to rewrite tests for property builders

* bugfix in redox builder - IE/EA tasks weren't being populated properly

* Fixed redox emmet-core test

* Redox emmet-builder tests also finished

* Fixed summary builder tests; now just one more

* Summary test passes; we're golden

* Remove enums

* (hopefully) fix core linting issues

* Linting

* Trying to deal with mypy

* Now another linting change

* More mypy changes

* Re-adding test files (for now)

* Shuffling files around

* Re-add conftest

* Slight tweaks to fix SummaryBuilder for molecules

* Super broke things yesterday. Hoping this fixes them

* Realized I was a fool; updating tests now

* I think last test?

* All (relevant) tests pass

* Fix lint - also, really need to prevent enums from being added in changes

* Going for it

* mypy makes me want to break things sometimes

* Pin numpy in emmet-api

* Initial structuring of MPcules API

* First attempt at an API for the MPcules molecules collection

* Added some hint schemes; API for MPcules tasks collection

* Builder bugfix

* Testing import issues

* Modify requirements

* Seems the root of the issue is that eigen is missing?

* One more attempt before I call in the big guns

* Last try before I just gut these tests; not worth it

* jkjk

* Trying to figure out new bug on property builders - task docs apparently with no orig???

* Use debugger more effectively

* Trying to catch an error again

* Resolved issue?

* Reverting testing change; think all is well in the world.

* Floating validation on partial charges/spin docs

* Fix for H1 specifically (I think)

* Remove unnecessary printing

* Starting work on bond query

* Fix mypy issue

* Draft MPcules bonds API

* (limited) API for partial charges and spins (better API requires more detail in emmet-core docs)

* Begun work on NBO query operators

* Orbital query WIP

* First-draft, very simple summary API

* Working summary api endpoint

* Rewriting MoleculeBuilder to collapse MoleculeDocs from different solvents

* WIP - changing molecules builder

* (for now) commenting out orbital routes;

Adding tests for minimal MPcules API routes

* Tests for mpcules/molecule route pass

* Test for mpcules/bond query operators

* Lint issues

* Linters make code better and my life worse

* Fix mypy issue

* mypy issues

* Linting

* Mypy is a pain point

* Pain continues

* mypy was a mistake

* What happened to duck typing?

* Incorporating some changes from Sam Blau (@samblau); one bugfix, a couple of small tweaks

* Small bugfixes to resources; made a new MPcules app

* Fix mpcule_app

* Draft of new molecule builder written; needs testing (tests need to be rewritten in general)

* Small tweaks to property builders

* Bugfix from Sam; debugging some weirdness with MoleculesBuilder

* Fixed AssociationBuilder; deprecated molecules now processed appropriately

* New MoleculesBuilder works as expected!

* Looks like SummaryBuilders is working correctly now as well, though property builders may not be

* I think we got it! Just had to change update keys for property builders

* Updated builder tests

* Linting

* Found some more lint

* mypy nonsense

* The mypy gods were furious; hopefully this prayer appeases them

* Fix failing test

* Remove superfluous test file

* remove debugging tools

* Accounting for multiple possible task ID formats

* Lint

---------

Co-authored-by: Jason Munro <[email protected]>
  • Loading branch information
espottesmith and Jason Munro authored Feb 25, 2023
1 parent f66e46e commit 413378c
Show file tree
Hide file tree
Showing 28 changed files with 471 additions and 204 deletions.
45 changes: 32 additions & 13 deletions emmet-builders/emmet/builders/molecules/atomic.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,11 +229,22 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
best_entry = relevant_entries[0]
task = best_entry["task_id"]

task_doc = TaskDocument(
**self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
)
tdoc = self.tasks.query_one({"task_id": task,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

task_doc = TaskDocument(**tdoc)

if task_doc is None:
continue
Expand Down Expand Up @@ -277,7 +288,7 @@ def update_targets(self, items: List[List[Dict]]):
# Neither molecule_id nor method need to be unique, but the combination must be
self.charges.update(
docs=docs,
key=["molecule_id", "method"],
key=["molecule_id", "method", "solvent"],
)
else:
self.logger.info("No items to update")
Expand Down Expand Up @@ -488,15 +499,23 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
best_entry = relevant_entries[0]
task = best_entry["task_id"]

task_doc = TaskDocument(
**self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
)
tdoc = self.tasks.query_one({"task_id": task,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if task_doc is None:
if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

task_doc = TaskDocument(**tdoc)

doc = PartialSpinsDoc.from_task(
task_doc,
molecule_id=mol.molecule_id,
Expand Down Expand Up @@ -538,7 +557,7 @@ def update_targets(self, items: List[List[Dict]]):
# Neither molecule_id nor method need to be unique, but the combination must be
self.spins.update(
docs=docs,
key=["molecule_id", "method"],
key=["molecule_id", "method", "solvent"],
)
else:
self.logger.info("No items to update")
23 changes: 17 additions & 6 deletions emmet-builders/emmet/builders/molecules/bonds.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,11 +247,22 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
best_entry = relevant_entries[0]
task = best_entry["task_id"]

task_doc = TaskDocument(
**self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
)
tdoc = self.tasks.query_one({"task_id": task,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

task_doc = TaskDocument(**tdoc)

if task_doc is None:
continue
Expand Down Expand Up @@ -294,7 +305,7 @@ def update_targets(self, items: List[List[Dict]]):
# Neither molecule_id nor method need to be unique, but the combination must be
self.bonds.update(
docs=docs,
key=["molecule_id", "method"],
key=["molecule_id", "method", "solvent"],
)
else:
self.logger.info("No items to update")
23 changes: 17 additions & 6 deletions emmet-builders/emmet/builders/molecules/orbitals.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,11 +213,22 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
for best in sorted_entries:
task = best["task_id"]

task_doc = TaskDocument(
**self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
)
tdoc = self.tasks.query_one({"task_id": task,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

task_doc = TaskDocument(**tdoc)

if task_doc is None:
continue
Expand Down Expand Up @@ -258,7 +269,7 @@ def update_targets(self, items: List[List[Dict]]):
self.orbitals.remove_docs({self.orbitals.key: {"$in": molecule_ids}})
self.orbitals.update(
docs=docs,
key=["molecule_id"],
key=["molecule_id", "solvent"],
)
else:
self.logger.info("No items to update")
69 changes: 46 additions & 23 deletions emmet-builders/emmet/builders/molecules/redox.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@
from math import ceil
from typing import Any, Dict, Iterable, Iterator, List, Optional, Union

from pymatgen.analysis.graphs import MoleculeGraph
from pymatgen.analysis.local_env import OpenBabelNN

from maggma.builders import Builder
from maggma.core import Store
from maggma.utils import grouper
Expand All @@ -17,7 +14,7 @@
from emmet.core.molecules.bonds import metals
from emmet.core.molecules.thermo import ThermoDoc
from emmet.core.molecules.redox import RedoxDoc
from emmet.core.utils import confirm_molecule, jsanitize
from emmet.core.utils import confirm_molecule, get_graph_hash, jsanitize
from emmet.builders.settings import EmmetBuildSettings


Expand Down Expand Up @@ -213,26 +210,54 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
continue

ie_sp_task_ids = [
int(e["task_id"]) for e in gg.entries
e["task_id"] for e in gg.entries
if e["charge"] == gg.charge + 1
and e["task_type"] == "Single Point"
and e["output"].get("final_energy")
]
ie_tasks = [TaskDocument(**e) for e in self.tasks.query({"task_id": {"$in": ie_sp_task_ids},
"formula_alphabetical": formula,
"orig": {"$exists": True}
})]
ie_tasks = list()
for i in ie_sp_task_ids:
tdoc = self.tasks.query_one({"task_id": i,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(i),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

ie_tasks.append(TaskDocument(**tdoc))

ea_sp_task_ids = [
int(e["task_id"]) for e in gg.entries
e["task_id"] for e in gg.entries
if e["charge"] == gg.charge - 1
and e["task_type"] == "Single Point"
and e["output"].get("final_energy")
]
ea_tasks = [TaskDocument(**e) for e in self.tasks.query({"task_id": {"$in": ea_sp_task_ids},
"formula_alphabetical": formula,
"orig": {"$exists": True}
})]
ea_tasks = list()
for i in ea_sp_task_ids:
tdoc = self.tasks.query_one({"task_id": i,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(i),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

ea_tasks.append(TaskDocument(**tdoc))

grouped_docs = self._collect_by_lot_solvent(thermo_docs, ie_tasks, ea_tasks)
if gg.charge in charges:
Expand Down Expand Up @@ -325,7 +350,7 @@ def update_targets(self, items: List[List[Dict]]):
self.redox.remove_docs({self.redox.key: {"$in": molecule_ids}})
self.redox.update(
docs=docs,
key=["molecule_id"],
key=["molecule_id", "solvent"],
)
else:
self.logger.info("No items to update")
Expand All @@ -339,7 +364,7 @@ def _group_by_graph(mol_docs: List[MoleculeDoc]) -> Dict[int, List[MoleculeDoc]]
:return: Grouped molecule entries
"""

mol_graphs_nometal: List[MoleculeGraph] = list()
graph_hashes_nometal: List[str] = list()
results = defaultdict(list)

# Within each group, group by the covalent molecular graph
Expand All @@ -352,19 +377,17 @@ def _group_by_graph(mol_docs: List[MoleculeDoc]) -> Dict[int, List[MoleculeDoc]]
mol_nometal.remove_species(metals)

mol_nometal.set_charge_and_spin(0)
mg_nometal = MoleculeGraph.with_local_env_strategy(
mol_nometal, OpenBabelNN()
)
gh_nometal = get_graph_hash(mol_nometal, node_attr="specie")

match = None
for i, mg in enumerate(mol_graphs_nometal):
if mg_nometal.isomorphic_to(mg):
for i, gh in enumerate(graph_hashes_nometal):
if gh_nometal == gh:
match = i
break

if match is None:
results[len(mol_graphs_nometal)].append(t)
mol_graphs_nometal.append(mg_nometal)
results[len(graph_hashes_nometal)].append(t)
graph_hashes_nometal.append(gh_nometal)
else:
results[match].append(t)

Expand Down
6 changes: 6 additions & 0 deletions emmet-builders/emmet/builders/molecules/summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
from math import ceil
from typing import Any, Optional, Iterable, Iterator, List, Dict

# from monty.serialization import loadfn, dumpfn

from maggma.builders import Builder
from maggma.core import Store
from maggma.utils import grouper
Expand Down Expand Up @@ -284,6 +286,10 @@ def _group_docs(docs: List[Dict[str, Any]], by_method: bool = False):
for td in to_delete:
del d[td]

# For debugging; keep because it might be needed again
# dumpfn(d, f"{mol_id}.json.gz")
# break

summary_doc = SummaryDoc.from_docs(molecule_id=mol_id, docs=d)
summary_docs.append(summary_doc)

Expand Down
35 changes: 24 additions & 11 deletions emmet-builders/emmet/builders/molecules/thermo.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ def _add_single_atom_enthalpy_entropy(task: TaskDocument, doc: ThermoDoc):

thermo_docs = list()

mm = MoleculeMatcher()
mm = MoleculeMatcher(tolerance=0.000001)

for mol in mols:
this_thermo_docs = list()
Expand All @@ -277,7 +277,7 @@ def _add_single_atom_enthalpy_entropy(task: TaskDocument, doc: ThermoDoc):
task_type = entry["task_type"]

if (
task_type == "Single Point"
task_type in ["Single Point", "Force"]
and entry["charge"] == mol.charge
and entry["spin_multiplicity"] == mol.spin_multiplicity
):
Expand Down Expand Up @@ -307,11 +307,22 @@ def _add_single_atom_enthalpy_entropy(task: TaskDocument, doc: ThermoDoc):
)[0]
task = best["task_id"]

task_doc = TaskDocument(
**self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
)
tdoc = self.tasks.query_one({"task_id": task,
"formula_alphabetical": formula,
"orig": {"$exists": True}})

if tdoc is None:
try:
tdoc = self.tasks.query_one({"task_id": int(task),
"formula_alphabetical": formula,
"orig": {"$exists": True}})
except ValueError:
tdoc = None

if tdoc is None:
continue

task_doc = TaskDocument(**tdoc)

if task_doc is None:
continue
Expand All @@ -337,9 +348,11 @@ def _add_single_atom_enthalpy_entropy(task: TaskDocument, doc: ThermoDoc):

matching_structures = list()
for entry in thermo_entries:
if (mm.fit(Molecule.from_dict(entry["molecule"]), Molecule.from_dict(best_spec["molecule"]))
and (sum(evaluate_lot(entry["level_of_theory"])) <
sum(evaluate_lot(best_spec["level_of_theory"])))):
mol1 = Molecule.from_dict(entry["molecule"])
mol2 = Molecule.from_dict(best_spec["molecule"])
if ((mm.fit(mol1, mol2) or mol1 == mol2)
and (sum(evaluate_lot(best_spec["level_of_theory"])) <
sum(evaluate_lot(entry["level_of_theory"])))):
matching_structures.append(entry)

if len(matching_structures) == 0:
Expand Down Expand Up @@ -416,7 +429,7 @@ def update_targets(self, items: List[List[Dict]]):
self.thermo.remove_docs({self.thermo.key: {"$in": molecule_ids}})
self.thermo.update(
docs=docs,
key=["molecule_id"],
key=["molecule_id", "solvent"],
)
else:
self.logger.info("No items to update")
Loading

0 comments on commit 413378c

Please sign in to comment.