Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] How to calculate host+guest binding free energy and enthalpy changes from CREGEN output? #11

Closed
ghost opened this issue Sep 10, 2020 · 3 comments

Comments

@ghost
Copy link

ghost commented Sep 10, 2020

Greetings. I want to calculate the free energy & enthalpy changes for the binding of a series of 1+ small molecule guests, to a neutral macrocyclic host, in water. I've performed a set of calculations like the following for the host, guests and guest@host complexes:

 crest [email protected] -chrg 1 -g h2o -nci -mdlen 10 -prop hess -keepdir 

Below is the CREGEN output from one of these guest@host runs, from which I calculate the free energy and enthalpy at 298K for the final ensemble as follows:

G(guest@host) = reference state Etot -323.174960188900 Ha - ensemble free energy (-1.280 kcal/mol) / 627.5 kcal/mol/Ha = -323.17292 Ha

and

H(guest@host) = G(guest@host -323.17292 Ha) + 298.15K * ensemble entropy (4.292 cal/mol K) / 1000 cal/kcal / 627.5 kcal/mol/Ha = -323.17088 Ha

and then, from identical calculations for the free host and free guests
deltaG(guest@host)= G(guest@host) - G(host) - G(guest)

Are the above calcs correct? In particular, am I treating the sign of the ensemble free energy correctly?

CREGEN - CONFORMER SYMMETRY ANALYSIS
threads = 12

input file name : crest_property.xyz
output file name : crest_property.xyz.sorted
number of atoms : 174
number of points on xyz files : 47
RMSD threshold : 0.1250
Bconst threshold : 15.0000
population threshold : 0.0500
conformer energy window /kcal : 6.0000
fragment in coord : 2
number of reliable points : 47
reference state Etot : -323.174960188900
running RMSDs...
done.
number of doubles removed by rot/RMSD : 0
total number unique points considered further : 47
Erel/kcal Etot weight/tot conformer set degen origin
1 0.000 -323.17496 0.26606 0.26606 1 1
2 0.144 -323.17473 0.20885 0.20885 2 1
3 0.462 -323.17422 0.12199 0.12199 3 1
... 44 entries omitted for brevity
T /K : 298.15
E lowest : -323.17496
ensemble average energy (kcal) : 0.496
ensemble entropy (J/mol K, cal/mol K) : 17.960 4.292
ensemble free energy (kcal/mol) : -1.280
population of lowest in % : 26.606
number of unique conformers for further calc 44
list of relative energies saved as "crest.energies"
Normal termination.

@pprcht
Copy link
Contributor

pprcht commented Sep 10, 2020

Hello,
In the calculation of G at temperature T your reference state is the total energy of your system, plus thermostatistical contributions from the respective frequency calculation and solvation free energy terms, so
G(T) = Egas+δGsolv(T)+GRRHO(T)
The ensemble free energy is another additive term Gconf(T)=-TSconf, so in your calculation it should be the opposite sign Gtot(T)=G(T)+Gconf(T). For the enthalpy I don't think it can be calculated like this, because the way it is written above one would end up with G(T) again.

Obtaining good ensemble entropies (and hence free energies) is very difficult. Short runtimes will result in incomplete ensembles and therefore in faulty ensemble free energies. Reproducibility can also be an issue. A part of our research is currently dedicated to these problems and we are hoping to finalize it soon with some precise recommendations for ensemble entropy calculations. This will be the update to version 2.11 of crest.

@ghost
Copy link
Author

ghost commented Sep 11, 2020

Thanks for this Philipp. Is it legitimate to calculate the guest@host binding enthalpy from the thermochemical output of separate vibrational frequency/hessian runs on the minimum energy ensemble structures obtained from a the above CREST runs? ie H(298K)guest@host-H(298)host-H(298)guest

Your point about short runtimes is well taken. In my systems of interest, using the default CREST -nci setup, luckily the host (cucurbit[8]uril) is quite rigid and by visual inspection of the MTD trajectories I can see the guests leave the CB8 after 2-5ps (depending on the Vbias parameter combination), so I end up with a significant proportion of cavity-bound structures versus surface-bound structures. In replicate runs I have been pleasantly surprised at how well CREST always finds the same minimum energy structures from very different starting poses and although the ensembles can differ a little, the distribution of the dominant low-energy structures is always nearly the same, producing less than 1kcal/mol differences in Gconf(T).

I have also done a few longer runs and it appears that, for these systems, the ensemble entropies converge after 30ps to give free energies that differ by less than 1kcal/mol from the 10ps runs. As even the shorter guest@host runs typically take 24-36 hrs on my old 12-core MacPro, I'm OK with the tradeoff of a small loss of accuracy.

Looking forward to what version 2.11 has to offer.

@pprcht
Copy link
Contributor

pprcht commented Sep 13, 2020

Yes, it is legimate to take G(T) (and related thermostatistical values) from seperate calculations and only use Gconf(T) from the crest output. In fact, this might even be the preferred way to do it since thermochemistry calculations purely at GFN level are not as reliable as, e.g. DFT level. Usually Gconf(T) is much smaller than G(T), so you want to have a good description of the latter. For these calculations and good post-processing of the GFN ensemble at DFT level you could take look at the enso repository. This is a project in which we mainly focus on those kind of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant