uncertainty quantification in energy corrections #1558

awvio · 2019-08-21T22:02:09Z

Summary

Added uncertainty quantification for energy corrections in compatibility module.

additional yaml file that contains uncertainties for each correction
get_correction() methods now return (correction, uncertainty)
process_entry() adds correction uncertainty to entry.data dictionary under key: 'correction_uncertainty'

Additional dependencies introduced (if any)

Plotly

TODO (if any)

WIP (needs review)

shyuep · 2019-08-22T16:05:47Z

I think I need a longer summary to know what this is for?

awvio · 2019-08-24T04:54:00Z

Hi, sorry I'll elaborate.

The corrections in MPCompatibility.yaml were refit and now have standard error values associated with them (from imperfect fits as well as uncertainty in experimental data). I'll add the code that calculates the corrections and their errors soon. The Correction classes which previously given an entry returned a correction now also return the error on that correction. The Compatibility classes which previously processed an entry by calculating a total correction to the energy value now also include the error on that correction.

mkhorton · 2019-08-25T18:03:41Z

Just to make some introductions, @awvio is working with us (@shyamd, Persson group) to help improve the MP corrections.

@awvio I wasn't familiar with your GitHub handle, so I was also confused for a moment :)

… new MPCompatibility and Errors yaml

shyuep · 2019-08-30T19:59:13Z

@mkhorton @kristinpersson @shyamd I think this needs to be discussed. I do understand of course that corrections can be changed/improved. I have no problems with putting uncertainties in. But this is a rather extensive set of changes and it is possible that there are codes out there relying on the existing correction values. Some of the corrections are not small at all.

We should have a proper review of all the correction values, as well as a possible strategy to have versioned corrections based on MP database versions. This cannot be done just by someone editing the code.

mkhorton · 2019-08-30T20:02:33Z

Yes, agreed this cannot be done solely in code. Perhaps worth discussing in next MP dissemination meeting; it'll need to be staged with the next database release, and have accompanying docs. Agreed also we need to store the MPCompatibility version (which could be pymatgen version even) in the database, and potentially in the resulting ComputedEntries also.

shyamd · 2019-08-30T20:12:00Z

There are plans for a docs page and a paper discussing these corrections and their effects. For the most part these don't change the old values by much. The big difference is that these corrections can be reproduced in comparison to the old numbers, which I've never been able to recreate.

shyuep · 2019-08-30T20:20:37Z

The changes are large, by ~0.5 eV/atom in some cases.

MP has gone from 2012 to 2019 with many generations of db changes along the way. I do not doubt that it would be difficult to reproduce the old values (some of which were done using proper AFM structures). The original values were actually documented in the pymatgen publication.

The point remains this is an exercise that requires careful discussion and vetting. A doc page and publication would be good. But this will not be merged until all the pieces are in place.

shyamd

We should consider Gzip'ing the two json files of entries since they are not human-readable as is, so they might as well be binary blobs.

shyamd · 2019-08-30T20:12:47Z

pymatgen/entries/MPCompatibility.yaml

-      W:  -4.351           #Fit to WO2 and WO3 (BURP: -2.762)
-      V:  -1.682           #Fit to V2O3 and V2O5 (VO2 fit is way off) (BURP: -1.764)
-      Ni: -2.164           #Based on burp version as of Feb 28 2011
+      V: -1.634860584846134


We probably don't need so many sig figs. The corrections are definitely not good sub 1 meV, so let's round to that for now unless someone has a different opinion

I agree with rounding to 1 meV

Does this also apply to the correction errors? Rounding to 3 decimal places often results in only 1 sigfig (error = 0.00x), at most 2.

shyamd · 2019-08-30T20:14:03Z

pymatgen/entries/compatibility.py

@@ -648,3 +721,277 @@ def __init__(self, compat_type="Advanced", correct_peroxide=True,
             GasCorrection(fp),
             AnionCorrection(fp, correct_peroxide=correct_peroxide),
             UCorrection(fp, MPRelaxSet, compat_type), AqueousCorrection(fp)])
+
+
+class CorrectionCalculator:


Let's move this to a new file. Compatability is getting too dense.

shyamd · 2019-08-30T20:14:23Z

pymatgen/entries/compatibility.py

+    species = ['oxide', 'peroxide', 'superoxide', 'F', 'Cl', 'Br', 'I', 'N', 'S', 'Se',\
+               'Si', 'Sb', 'Te', 'V', 'Cr', 'Mn', 'Fe', 'Co','Ni', 'Cu', 'Mo'] #species that we're fitting corrections for
+
+    def __init__(self, exp_json, comp_json):


Add docstrings for all the methods. Let's also add type hints.

shyamd · 2019-08-30T20:15:44Z

pymatgen/entries/compatibility.py

+        return self.corrections_dict
+
+
+    def graph_residual_error(self):


Switch to plotly plots so that we can have them interactive and annotated for more information.

shyamd · 2019-08-30T20:16:08Z

pymatgen/entries/compatibility.py

+        if len(self.corrections) == 0:
+            self.compute_corrections()
+
+        aqueous = OrderedDict()


Add a comment mentioning these come from the old YAML file and are not yet auto-generated by this class.

shyamd · 2019-08-30T20:20:47Z

pymatgen/entries/MPCompatibility.yaml

-  O2:  -0.316731
-  N2:  -0.295729
-  F2:  -0.313025
+  sulfide: -0.6314172089245144


I'm tempted to get rid of the Sulfide correction class and just treat it as another Anion correction since we didn't find a big difference for polysulfides.

I agree. In a forthcoming commit we've consolidated OxideCorrection and SulfideCorrection into one AnionCorrections key. This will make it clearer which Class is calling the corrections as well.

shyuep · 2019-08-30T20:28:10Z

Just to give an example, Fe3O4 in MP seems to be ferromagnetic. The actual ground state is ferrimagnetic. The energy diffs are not small. This is a point I have made since 2013.

The original corrections are fitted to binary oxides that were properly charge-ordered and magnetized.

rkingsbury · 2019-08-30T21:30:02Z

I would say you could carry 1 extra digit for the uncertainty (so, round uncertainty to 0.1 meV) Ryan Kingsbury, Ph.D., P.E. Postdoctoral Researcher The Material Project C: 713-851-7231 E: [email protected]

…

On Aug 30, 2019, 2:27 PM -0700, awvio ***@***.***>, wrote: @awvio commented on this pull request. In pymatgen/entries/MPCompatibility.yaml: > @@ -2,55 +2,46 @@ Name: MP Advanced: UCorrections: O: - Mn: -1.68085015096 #Fit to MnO, Mn3O4 and MnO2 (BURP:-1.687) - Fe: -2.733 #Fit to FeO and Fe2O3 (Fe3O4 probably wrong) - Co: -1.874 #Fit to CoO, Co3O4 (BURP:-1.751) - Cr: -2.013 #Fit to Cr2O3 (CrO3 missing) (BURP: -2.067) - Mo: -3.531 #Fit to MoO3 and MoO2 (BURP: -2.668) - W: -4.351 #Fit to WO2 and WO3 (BURP: -2.762) - V: -1.682 #Fit to V2O3 and V2O5 (VO2 fit is way off) (BURP: -1.764) - Ni: -2.164 #Based on burp version as of Feb 28 2011 + V: -1.634860584846134 Does this also apply to the correction errors? Rounding to 3 decimal places often results in only 1 sigfig (error = 0.00x), at most 2. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

…Correction class; moved CorrectionCalculator to separate file

…nger needed files

…cording to smaller number of sigfigs for corrections

Remove polysulfide logic from sulfide_type()

rkingsbury · 2019-09-10T21:48:09Z

I can't figure out why my last commit about polysulfide caused the Travis Build to fail. The test that's failing (ReactionDiagramTest.test_formed_formula) doesn't touch any of the code I changed as far as I can tell.

mkhorton · 2019-09-10T22:23:49Z

A good start may be to modify the ReactionDiagramTest.test_formed_formula -- it doesn't look like a very good test to me because it's not clear what's changed when it fails (I appreciate you haven't modified this file/it's not your test).

Currently it's:

self.assertTrue(formula in formed_formula)

Maybe:

self.assertIn(formula, formed_formula)

I don't know who wrote that test originally.

rkingsbury · 2019-09-10T22:25:52Z

@mkhorton OK, I can try that. The test runs successfully on my local machine and on AppVeyor though (without modifying the assertion statement). Could it be something about the Travis environment?

mkhorton · 2019-09-10T22:28:15Z

Yeah, it looks like the kind of test that might have machine-specific precision issues but I'm not sure, I suspect the test could be improved.

…nto corrections

…r provided list

…nto corrections

…ctions

…aster' of git://github.com/materialsproject/pymatgen into corrections

…nto corrections

…aster' of git://github.com/materialsproject/pymatgen into corrections

mkhorton · 2020-08-14T22:22:50Z

I think it's time to merge, thank you @awvio and @rkingsbury for the very extensive work done here.

To re-iterate changes, the major changes in this PR are a new CorrectionCalculator class to reproducibly generate formation enthalpy corrections for DFT energies from experimental data, including the addition of uncertainties to these corrections from the corresponding experimental uncertainties. This comes with a new MP2020 scheme containing an example fit along with numerous smaller improvements.

For any external pymatgen users reading this, note that the specific data and specific values in this MP2020 scheme are subject to change until we reach publication status, and @rkingsbury is planning further iterative improvements. This PR contains the machinery of the corrections class, and it is this machinery which has now been finalized.

The current MP correction scheme remains usable and unchanged by this PR.

Happy Friday everyone!

added correction error to compatibility module with tests

ab7cbda

mkhorton changed the title ~~correction errors in compatibility module (WIP)~~ [WIP] correction errors in compatibility module Aug 29, 2019

rkingsbury and others added 4 commits August 29, 2019 16:27

merge Amanda's correction work into new branch

fdf0906

Reformat .yaml keys to match correction classes

c18d772

added CorrectionCalculator class, the json files of calc/exp entries,…

bdee700

… new MPCompatibility and Errors yaml

'merged

6001f99

shyamd reviewed Aug 30, 2019

View reviewed changes

awvio and others added 10 commits August 30, 2019 14:56

merged format changes to yaml and made corresponding changes to Anion…

e305f4c

…Correction class; moved CorrectionCalculator to separate file

added docstrings and type hints to CorrectionCalculator

92f8e52

fixed defaultdict import, switched to plotly

ba4e971

ran style checker

24b4c27

manually fixed other style errors

4c149fd

changed json files containing exp/calc entries to gzip, removed no lo…

70722e3

…nger needed files

fixed new style errors

15aea74

changed input filename back to MPCompatibility.yaml, changed tests ac…

74ce229

…cording to smaller number of sigfigs for corrections

Remove polysulfide logic from sulfide_type()

7e8d5aa

Remove polysulfide logic from sulfide_type()

17a2adb

Remove polysulfide logic from sulfide_type()

rkingsbury and others added 28 commits July 22, 2020 09:30

test fixes; make uncertainty kwarg always

878734b

remove pylint C0330 directive

99fe7a7

Merge branch 'master' of git://github.com/materialsproject/pymatgen i…

f5f048d

…nto corrections

add @staticmethod to explain

dfda9ef

allow user to input species list

298ac4f

pycodestyle fix, clarify .json.gz file in compute_from_files docstring

2c97a94

allow_unstable can be bool or float, polyanions filtered based on use…

19ea07f

…r provided list

Merge branch 'master' of git://github.com/materialsproject/pymatgen i…

9ae9d54

…nto corrections

missing import

155d2a5

change unknown uncertainties to np.nan instead of 0

e44e47f

pycodestyle split line with too many characters

4c98554

pycodestyle indent

21d3349

pylint trailing whitespace

8ffd207

fix merge problem

2cd8073

Fix typo

635414d

revert sulfide_type docstring change

9556fe1

make kwargs more verbose

bb5543f

label corrections kwarg in tests

3467c57

lint docstring

c463246

Merge branch 'master' of https://github.com/awvio/pymatgen into corre…

255749c

…ctions

whitespace

c94963e

Merge branch 'master' of https://github.com/awvio/pymatgen; branch 'm…

3f7f475

…aster' of git://github.com/materialsproject/pymatgen into corrections

replace .gz with .json.gz extension

9e21a5e

update file extension in test file

9e75eaa

changed sulfide back to S

e03ad90

Merge branch 'master' of git://github.com/materialsproject/pymatgen i…

eac0756

…nto corrections

Merge branch 'master' of https://github.com/awvio/pymatgen; branch 'm…

f24317b

…aster' of git://github.com/materialsproject/pymatgen into corrections

rename corrections_calc.py

f83d344

mkhorton merged commit 3e2fcce into materialsproject:master Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uncertainty quantification in energy corrections #1558

uncertainty quantification in energy corrections #1558

awvio commented Aug 21, 2019 •

edited

Loading

shyuep commented Aug 22, 2019

awvio commented Aug 24, 2019

mkhorton commented Aug 25, 2019

shyuep commented Aug 30, 2019

mkhorton commented Aug 30, 2019

shyamd commented Aug 30, 2019

shyuep commented Aug 30, 2019

shyamd left a comment

shyamd Aug 30, 2019

rkingsbury Aug 30, 2019

awvio Aug 30, 2019

shyamd Aug 30, 2019

shyamd Aug 30, 2019

shyamd Aug 30, 2019

shyamd Aug 30, 2019

shyamd Aug 30, 2019

rkingsbury Aug 30, 2019

shyuep commented Aug 30, 2019

rkingsbury commented Aug 30, 2019 via email

rkingsbury commented Sep 10, 2019

mkhorton commented Sep 10, 2019 •

edited

Loading

rkingsbury commented Sep 10, 2019

mkhorton commented Sep 10, 2019

mkhorton commented Aug 14, 2020

uncertainty quantification in energy corrections #1558

uncertainty quantification in energy corrections #1558

Conversation

awvio commented Aug 21, 2019 • edited Loading

Summary

Additional dependencies introduced (if any)

TODO (if any)

shyuep commented Aug 22, 2019

awvio commented Aug 24, 2019

mkhorton commented Aug 25, 2019

shyuep commented Aug 30, 2019

mkhorton commented Aug 30, 2019

shyamd commented Aug 30, 2019

shyuep commented Aug 30, 2019

shyamd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shyuep commented Aug 30, 2019

rkingsbury commented Aug 30, 2019 via email

rkingsbury commented Sep 10, 2019

mkhorton commented Sep 10, 2019 • edited Loading

rkingsbury commented Sep 10, 2019

mkhorton commented Sep 10, 2019

mkhorton commented Aug 14, 2020

awvio commented Aug 21, 2019 •

edited

Loading

mkhorton commented Sep 10, 2019 •

edited

Loading