Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quasi-RRHO Thermochemistry Analysis Module #2028

Merged
merged 40 commits into from
Aug 21, 2023

Conversation

arepstein
Copy link

Summary

Added quasirrho.py to the analysis subpackage to calculate the Quasi-RRHO free energy from a Gaussian or QChem frequency calculation.

  • Calculates Grimme's Quasi-RRHO free energy
  • Option for also correcting for solvent concentration

Additional dependencies introduced (if any)

*None

TODO (if any)

  • The rotational symmetry number is required as an input. To calculate this on the fly, will consider updates to PointGroupAnalyzer

Before a pull request can be merged, the following items must be checked:

Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is
highly recommended that you use the pre-commit hook provided in the pymatgen
repository. Simply cp pre-commit .git/hooks and a check will be run prior to
allowing commits.

@arepstein arepstein changed the title Readyto pr Quasi-RRHO Thermochemistry Analysis Module Jan 7, 2021
from pymatgen.io.gaussian import GaussianOutput
from pymatgen.io.qchem.outputs import QCOutput

test_dir = os.path.join(os.path.dirname(__file__), "..", "..", "..",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using module level test_dir, pls inherit from PymatgenTest for unit tests and then use self.test_dir instead. This will ensure correct behavior. Thanks.

@mkhorton
Copy link
Member

@arepstein is this PR ready for review to be merged?

@arepstein
Copy link
Author

I believe so. The only pending update in my mind is automated detection of the rotational symmetry number, but I think it's okay as an input parameter for now.

from pymatgen.io.qchem.outputs import QCOutput

# Define useful constants
kb = 1.380662E-23 # J/K
Copy link
Member

@mkhorton mkhorton Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these already defined in pymatgen.core.units?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like yes! I was not aware of pymatgen units -- will fix and commit the changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the units I'm using are slightly different and don't lend themselves well to the subclasses in pymatgen.core.units. For example, converting amuangs^2 to kgm^2 for moments of inertia. Would you recommend still using Units subclasses when I can?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend supplementing the existing units if you don't have the ones you want.

To be specific, I'm talking about the units constants in pymatgen.core.units specifically, rather than suggesting you use the classes in that file.

The reason for this recommendation is simply that if we expand our unit functionality later (we've explored a few options here), it's a lot easier for us to spot where units are being used in pymatgen if unit constants are imported from the same place.

Alex Epstein added 3 commits June 10, 2021 16:15
Tried to use more built-in units. Added functionality for checking if linear and adjusting rotational entropy accordingly. Added testing for linear molecule
@htz1992213
Copy link
Contributor

Hey @arepstein , I get to know this PR in today's subgroup meeting. I am just wondering how the functionalities here comparing to those in the Goodvibes package https://github.com/bobbypaton/GoodVibes?

@arepstein
Copy link
Author

Hey @arepstein , I get to know this PR in today's subgroup meeting. I am just wondering how the functionalities here comparing to those in the Goodvibes package https://github.com/bobbypaton/GoodVibes?

Thanks for pointing this out, I didn't know about Goodvibes when I put in this PR. This PR should be identical to the Grimme Quasi-RRHO approximation for the entropy that's implemented in GoodVibes, but does not include any other methods that GoodVibes implements. One big difference I see in terms of implementation into pymatgen infrastructure is that this PR can calculate Quasi-RRHO entropeis for Q-Chem output files, Gaussian output files, or manual input parameters, which is useful for Atomate integration. It would be a good idea to check that this agrees with GoodVibes.

@coveralls
Copy link

coveralls commented Mar 4, 2022

Coverage Status

Coverage decreased (-0.7%) to 83.428% when pulling d08a1ca on arepstein:readytoPR into 3376f27 on materialsproject:master.

@janosh janosh added the stale Abandoned or conflicting PRs and outdated issues label Oct 23, 2022
Copy link
Contributor

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @arepstein ! I used it recently and I think it's a very useful addition to the molecular DFT infrastructure in pymatgen. I made a few comments; if you can pull in the latest pymatgen and address these hopefully we can get @janosh or another maintainer to review soon; I know this one has been open for a while.

Comment on lines 36 to 38
# Define useful conversion factors
kcal2hartree = 0.0015936 # kcal/mol to hartree/mol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to define this conversion factor using scipy.constants? If not, just make sure you carry enough decimal places to not lose precision due to the very different magnitudes of Ha and kcal.

class QuasiRRHO:
"""
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation.
All outputs are in atomic units.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be explicit about what atomic units are? For thickheaded engineers like myself, I have to go and google this :)


class QuasiRRHO:
"""
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add the citation to the paper in this docstring so that it's visible in, e.g., Jupyter Notebook and documentation pages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding to the docstring! Since you opened this PR, we have also added a nice system for citations in pymatgen based on duecredit. Would you mind adding a @due.dcite decorator to this class? (example)

Comment on lines 75 to 84
:param output: Requires input of a Gaussian output file,
QChem output file, or dictionary of necessary inputs:
{"mol": Molecule, "mult": spin multiplicity (int),
"frequencies": list of vibrational frequencies [a.u.],
elec_energy": electronic energy [a.u.]}
:param sigma_r (int): Rotational symmetry number
:param temp (float): Temperature [K]
:param press (float): Pressure [Pa]
:param conc (float): Solvent concentration [M]
:param v0 (float): Cutoff frequency for Quasi-RRHO method [cm^1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, please convert to a google style docstring

Since all of the outputs are stored as class attributes, it would also be especially nice to have a Attributes section that describes each output, with units

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could define python @property methods for each of the relevant outputs (e.g., g_corrected, h_corrected`, which would let you give each one an explicit docstring. Each property would not do anything other than contain the docstring, e.g.

@property
def g_corrected(self):
'''
Corrected free energy in Ha
'''
return self.g_corrected

so the only benefit to this structure would be visibility of the outputs to users

@rkingsbury
Copy link
Contributor

See also some small edits I made when I used the code, in a PR against your fork: arepstein#1

@rkingsbury
Copy link
Contributor

I will also note for other reviewers that although the GoodVibes package contains similar functionality, it only accepts Gaussian output files, and hence is not useful with our high-throughput Q-Chem infrastructure. So this PR is valuable because it gives us those capabilities.

@janosh
Copy link
Member

janosh commented May 8, 2023

@arepstein Is this still WIP?

@arepstein
Copy link
Author

Hey @rkingsbury and @janosh, I've incorporated Ryan's edits and recommendations. While making these edits it became clear that a Class might not be the best choice for QuasiRRHO. As it stands now, it might be better as a simple function. Ideally, I think it could be nice to have a Thermochemistry class that is a Molecule plus some extra information like frequencies. We could then have different treatments of thermochemistry, like GoodVibes implements, but in a way works well with our ecosystem.

That being said, hopefully QuasiRRHO is still useful for now and can be a starting point for later changes if desired.

Copy link
Contributor

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arepstein and nice work on this. Should be a good addition to pymatgen.

from pymatgen.io.qchem.outputs import QCOutput

# Define useful constants
kb = kb_ev * const.eV # Pymatgen kb [J/k]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[J/k] -> [J/K]

kcal2hartree = 1000 * const.calorie / const.value("Hartree energy") / const.Avogadro


def get_avg_mom_inertia(mol):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for this PR, but I wonder if this should be made a @Property of the Molecule class? Any strong opinions one way or another?


class QuasiRRHO:
"""
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding to the docstring! Since you opened this PR, we have also added a nice system for citations in pymatgen based on duecredit. Would you mind adding a @due.dcite decorator to this class? (example)

ev *= R
etot = (et + er + ev) * kcal2hartree / 1000
self.h_corrected = etot + R * self.temp * kcal2hartree / 1000
molarity_corr = R_ha * self.temp * np.log(R_volume * self.temp * self.conc)
Copy link
Contributor

@rkingsbury rkingsbury Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the molarity correction is RT log (RTC). Shouldn't it just be RT log(C)? I may be ignorant, but please double check / confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this formulation implicitly assumes a standard state of 1 M. It might be good to state that in the docstring in case someone is using a calc with a different standard state

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following Wheeler's script for the concentration correction. I believe it comes from a thermodynamic cycle for cluster continuum models that leads to an extra RT log(R_volume T). It's equations 7 and 8 in the Ho and Coote pKa paper.
Thanks for the point about the 1M assumption. I will add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying. So looking at that section of the paper:
image

I'm pretty sure the extra -RT log [R~T] term is there to adjust the reference states (the paragraph above and the different units of R and R~ are the clues). The solvation energy assumes 1 mol/L in both gas and solution, so you need a correction for that difference. The numerical value of R~T is 24.43 atm / (mol/L), which is the same value one might use to make a standard state correction.

So I think it should be RT ln C, and I'm glad we have the standard state in the docstring to hopefully call attention to the need to be careful interpreting the number. Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your explanation of the source of the extra term makes sense to me, but I don't understand why we shouldn't include the standard state correction when providing a concentration-corrected free energy in solvent. If RTln(concentration) uses mol/L, shouldn't we also correct for the 1 atm standard state used in the DFT code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. To me what makes it tricky is that you only need to do the standard state correction when you are writing a reaction energy between species with different reference states, but this module just computes a single correction for a single species. (in other words, the reason for the correction is to make sure all species use the same reference state). We don't necessarily know that users of this class are going to take this free energy and combine it with a gas phase energy.

The other small issue is that I think not every DFT code necessarily uses the same reference state. But I think both Q-Chem and Gaussian use 1 atm for gas phase calcs, and as long as we document what we're doing I'm not too worried about that.

Personally I think it will be more transparent and less confusing to users to just do RT ln C and let them adjust as needed if they have to change reference states. But again, as long as we document what's going on it could be OK to build it in.

If you want to build it in, I suggest writing it as ( RT log ( C) - RT log (1/24.55) ) and adding a comment line to make explicitly clear that you're dividing C by the molarity of an ideal gas at 1 atm.

Do others like @samblau or @espottesmith or @materialsproject/second-foundation have thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that we should be more specific in the documentation, and I'll put that in. Maybe with a reference to describe it in detail if I can find a really good one. I also think of the free energy in implicit solvent as already containing an assumed change in free energy from vacuum to solution, and would thus include the standard state correction when using implicit solvent. Is that correct?

I would also appreciate input from everyone you tagged!

I think another option would be to just entirely remove the concentration correction for now. It is somewhat just a legacy from converting a script that was useful to me into Pymatgen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that we should be more specific in the documentation, and I'll put that in. Maybe with a reference to describe it in detail if I can find a really good one. I also think of the free energy in implicit solvent as already containing an assumed change in free energy from vacuum to solution, and would thus include the standard state correction when using implicit solvent. Is that correct?

I just think of the free energy of an isolated molecule as "DFT energy (~ 0K enthalpy) plus T-dependent enthalpy and entropy".

My understanding is that implicit solvent models adjust the free energy for interactions between the solute and the medium but do not account for changes in standard state (vac -> solution). Unless we're talking about formation free energy or a reaction energy, the free energy of an isolated molecule begs the question "relative to what"? And you don't have to introduce a standard/reference state until you answer that question. I know some solvent models output a solvation (reaction) energy, but I believe that implicitly assumes the same reference state for both vacuum and solvent. There's nothing inherently wrong with that but it's often not what the end user needs for comparison to experiment.

I think another option would be to just entirely remove the concentration correction for now. It is somewhat just a legacy from converting a script that was useful to me into Pymatgen.

Yeah, the more I think about it, that might be the cleanest thing to do here. This module is for calculating QRRHO thermochemistry, which (unless I'm mistaken) does not specifically have anything to do with an implicit solvent model and does not make any assumptions about the concentration; it's just a way of dampening low-frequency contributions to rot and vib entropy. So in the same way that standard thermochemistry output in Gaussian/Q-Chem doesn't make any adjustments for concentration, maybe this shouldn't either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not entirely convinced, mainly because implicit solvent models are made to capture solvation free energies, but I definitely understand what you're saying. I'd love to have this conversation offline more too, given how fundamentally important it is to get these corrections correct, but I agree that that can be beyond the scope of this PR. I'll remove it for now and maybe if Pymatgen wants to start recommending free energy treatments we can implement more general thermochemistry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss offline!

Alex Epstein and others added 4 commits August 2, 2023 10:13
fixed typo in units
added duecredit decorator
update 1M in docs
fixing bugs
@@ -83,7 +84,7 @@ class QuasiRRHO:
Attributes:
temp (float): Temperature [K]
press (float): Pressure [Pa]
conc (float): Solvent concentration [M]
conc (float): Solvent concentration. Assumes 1M unless specified [M]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add "Concentration correction is reference to a 1 M standard state."

arepstein and others added 2 commits August 11, 2023 12:08
Removed solvent concentration corrections from the code so the class only deals with QuasiRRHO corrections
@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@57d8a2f). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #2028   +/-   ##
=========================================
  Coverage          ?   74.06%           
=========================================
  Files             ?      230           
  Lines             ?    69403           
  Branches          ?    16161           
=========================================
  Hits              ?    51403           
  Misses            ?    14957           
  Partials          ?     3043           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rkingsbury
Copy link
Contributor

I think this is ready to merge, right @arepstein ? @janosh can you remove the 'stale' label?

@arepstein
Copy link
Author

Yes, I believe ready to merge!

@janosh janosh added enhancement A new feature or improvement to an existing one molecules Molecule stuff analysis Concerning pymatgen.analysis and removed stale Abandoned or conflicting PRs and outdated issues labels Aug 21, 2023
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been a recent change of the pymatgen test folder structure. The file test_files/molecules/co2.log now needs to in tests/files/.... Would also be good to gzip this file.

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @arepstein! 👍

@janosh janosh merged commit 0d5eed0 into materialsproject:master Aug 21, 2023
@arepstein
Copy link
Author

Thanks @janosh and @rkingsbury for all the help, edits, and revisions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Concerning pymatgen.analysis enhancement A new feature or improvement to an existing one molecules Molecule stuff
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants