From 4422da9ea8f20bdbcaa09c1104c59315bd47f9a8 Mon Sep 17 00:00:00 2001 From: Ryan Kingsbury Date: Mon, 21 Nov 2022 07:29:37 -0800 Subject: [PATCH 1/4] Create 0002-scope-of-emmet-models.md --- decisions/0002-scope-of-emmet-models.md | 63 +++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 decisions/0002-scope-of-emmet-models.md diff --git a/decisions/0002-scope-of-emmet-models.md b/decisions/0002-scope-of-emmet-models.md new file mode 100644 index 0000000..d085de7 --- /dev/null +++ b/decisions/0002-scope-of-emmet-models.md @@ -0,0 +1,63 @@ +# Scope of emmet document models + +## Context and Problem Statement + +`emmet` currently contains document model that define schemas for certain calculation +types, particularly from VASP. The newer VASP schemas developed alongside `atomate2` are +now fairly stable, and hence should be merged into `emmet`. This raises several questions +about backwards compatibility with legacy document models and with dependendcy handling. Specifically: + +1. The `atomate2` documents contain some fields and changed field names compared to `atomate1` +2. `atomate1` task documents use `task_id` whereas `atomate2` documents use `UUID` as identifiers. +3. Many `atomate2` document models contain logic for instantiating from raw calculation + outputs, e.g. `from_vasp_files()`. Should these live in the emmet document models, or + elsewhere? +4. The methods in item 3) may create dependency bloat. How to handle? +5. When should a document model "graduate" from `atomate2` to `emmet`? +6. Is the purpose of `emmet-core` to power MP, or to host general document models for the community? + +Clarification of the above is also important not only for updating our VASP models, but +to guide other work-in-progress document models in `atomate2` (e.g., Q-Chem) + +## Decision Drivers + +- It is often unfeasible to re-parse existing MP calculations. So any changes to key names or + reorganization of task documents needs to be accompanied by a database migration script +- Alternatively, `atomate2` document models can just *add* keys without modifying existing ones +- As part of updating document models, the meaning of `task_label` should be revisitied b/c it is sometimes overloaded by users, which breaks queries + +## Considered Options + +### Option 1: Use optional dependencies + +Dependecies that are only required by specific methods (e.g. `from_vasp_files()`) will be made optional and imported from within the method itself. We make sure that only non-optional dependencies are required to instantiate a document model with arbitrary data. @utf or other `atomate2` maintainers can be made maintainers of `emmet` as well to reduce development friction. + +- Good, because this avoids dependency bloat +- Good, because this allows methods that instantiate Task Docs from calculation outputs to be defined in the same place as the Task Doc itself +- Neutral, because making `atomate2` maintainers `emmet` maintainers minimizes development friction + +### Option 2: Keep calculation code-specific dependencies in separate packages + +`emmet` can be used to define code-agnostic document models (as is currently the case in `emmet-core`), while document models specific to a code (e.g. VASP) would live in separate I/O packages e.g. `pymatgen.io.vasp`. + +- Good, because this would keep most of the code maintenance that requires detailed knowledge of a specific calculation code in one repo. +- Bad, because this approach will result in document models defined in multiple places. + +### Proposed clarification of the purpose of `emmet-core` + +The purpose of `emmet-core` is to power MP and to provide examples and base classes (e.g. `TaskDocument`, `PropertyDoc`) that the community can build on. We can encourage users of +other codes or new methods to check `emmet` first to see if a document model exists for that +calculation type, and if it doesn't, to develop the new model in `atomate2` alongside their workflows. `emmet` and `atomate2` maintainers / MP Staff can make decisiosn about whether and when +it makes sense to port any of these into `emmet`. + +- It is not sustainable for `emmet` to be a general home for document models of any type, because of the sheer number of codes and calculation types + +## Decision Outcome + +TBD + +## More Information + +[emmet issue #517](https://github.com/materialsproject/emmet/issues/517) + +[atomate 2 #150](https://github.com/materialsproject/atomate2/issues/150) From c6260d37be1e7b06091bb3614a693cfbfb88638b Mon Sep 17 00:00:00 2001 From: Alex Ganose Date: Mon, 6 Feb 2023 13:35:44 +0000 Subject: [PATCH 2/4] Fix typos and formatting in decision 002. --- decisions/0002-scope-of-emmet-models.md | 80 ++++++++++++++++--------- 1 file changed, 52 insertions(+), 28 deletions(-) diff --git a/decisions/0002-scope-of-emmet-models.md b/decisions/0002-scope-of-emmet-models.md index d085de7..0b3e248 100644 --- a/decisions/0002-scope-of-emmet-models.md +++ b/decisions/0002-scope-of-emmet-models.md @@ -2,55 +2,79 @@ ## Context and Problem Statement -`emmet` currently contains document model that define schemas for certain calculation -types, particularly from VASP. The newer VASP schemas developed alongside `atomate2` are -now fairly stable, and hence should be merged into `emmet`. This raises several questions -about backwards compatibility with legacy document models and with dependendcy handling. Specifically: - -1. The `atomate2` documents contain some fields and changed field names compared to `atomate1` -2. `atomate1` task documents use `task_id` whereas `atomate2` documents use `UUID` as identifiers. -3. Many `atomate2` document models contain logic for instantiating from raw calculation - outputs, e.g. `from_vasp_files()`. Should these live in the emmet document models, or - elsewhere? -4. The methods in item 3) may create dependency bloat. How to handle? +`emmet` currently contains the document models that define schemas for certain +calculation types, particularly from VASP. The newer VASP schemas developed +alongside `atomate2` are now fairly stable, and hence should be merged into +`emmet`. This raises several questions about backward compatibility with legacy +document models and with dependency handling. Specifically: + +1. The `atomate2` documents contain new fields and changed field names compared + to `atomate1`. +2. `atomate1` task documents use `task_id` whereas `atomate2` documents use + `UUID` as identifiers. +3. Many `atomate2` document models contain logic for instantiating from raw + calculation outputs, e.g. `from_vasp_files()`. Should these live in the emmet + document models, or elsewhere? +4. The methods in item 3) may create dependency bloat - how should this be + handled? 5. When should a document model "graduate" from `atomate2` to `emmet`? -6. Is the purpose of `emmet-core` to power MP, or to host general document models for the community? +6. Is the purpose of `emmet-core` to power MP, or to host general document + models for the community? -Clarification of the above is also important not only for updating our VASP models, but -to guide other work-in-progress document models in `atomate2` (e.g., Q-Chem) +Clarification of the above is also important not only for updating our VASP +models, but to guide other work-in-progress document models in `atomate2` (e.g., +Q-Chem). ## Decision Drivers -- It is often unfeasible to re-parse existing MP calculations. So any changes to key names or - reorganization of task documents needs to be accompanied by a database migration script -- Alternatively, `atomate2` document models can just *add* keys without modifying existing ones -- As part of updating document models, the meaning of `task_label` should be revisitied b/c it is sometimes overloaded by users, which breaks queries +- It is often unfeasible to re-parse existing MP calculations. Any changes to + key names or reorganization of task documents need to be accompanied by a + database migration script. +- Alternatively, `atomate2` document models can just *add* keys without + modifying existing ones. +- As part of updating document models, the meaning of `task_label` should be + revisited because it is sometimes overloaded by users, which breaks queries. ## Considered Options ### Option 1: Use optional dependencies -Dependecies that are only required by specific methods (e.g. `from_vasp_files()`) will be made optional and imported from within the method itself. We make sure that only non-optional dependencies are required to instantiate a document model with arbitrary data. @utf or other `atomate2` maintainers can be made maintainers of `emmet` as well to reduce development friction. +Dependencies that are only required by specific methods (e.g. +`from_vasp_files()`) will be made optional and imported from within the method +itself. We make sure that only non-optional dependencies are required to +instantiate a document model with arbitrary data. @utf or other `atomate2` +maintainers can be made maintainers of `emmet` as well to reduce development +friction. - Good, because this avoids dependency bloat -- Good, because this allows methods that instantiate Task Docs from calculation outputs to be defined in the same place as the Task Doc itself -- Neutral, because making `atomate2` maintainers `emmet` maintainers minimizes development friction +- Good, because this allows methods that instantiate Task Docs from calculation + outputs to be defined in the same place as the Task Doc itself +- Neutral, because making `atomate2` maintainers `emmet` maintainers minimizes + development friction ### Option 2: Keep calculation code-specific dependencies in separate packages -`emmet` can be used to define code-agnostic document models (as is currently the case in `emmet-core`), while document models specific to a code (e.g. VASP) would live in separate I/O packages e.g. `pymatgen.io.vasp`. +`emmet` can be used to define code-agnostic document models (as is currently the +case in `emmet-core`), while document models specific to a code (e.g. VASP) +would live in separate I/O packages e.g. `pymatgen.io.vasp`. -- Good, because this would keep most of the code maintenance that requires detailed knowledge of a specific calculation code in one repo. -- Bad, because this approach will result in document models defined in multiple places. +- Good, because this would keep most of the code maintenance that requires + detailed knowledge of a specific calculation code in one repo. +- Bad, because this approach will result in document models defined in multiple + places. ### Proposed clarification of the purpose of `emmet-core` -The purpose of `emmet-core` is to power MP and to provide examples and base classes (e.g. `TaskDocument`, `PropertyDoc`) that the community can build on. We can encourage users of -other codes or new methods to check `emmet` first to see if a document model exists for that -calculation type, and if it doesn't, to develop the new model in `atomate2` alongside their workflows. `emmet` and `atomate2` maintainers / MP Staff can make decisiosn about whether and when +The purpose of `emmet-core` is to power MP and to provide examples and base +classes (e.g. `TaskDocument`, `PropertyDoc`) that the community can build on. We +can encourage users of other codes or new methods to check `emmet` first to see +if a document model exists for that calculation type, and if it doesn't, to +develop the new model in `atomate2` alongside their workflows. `emmet` and +`atomate2` maintainers / MP Staff can make decisions about whether and when it makes sense to port any of these into `emmet`. -- It is not sustainable for `emmet` to be a general home for document models of any type, because of the sheer number of codes and calculation types +- It is not sustainable for `emmet` to be a general home for document models of + any type, because of the sheer number of codes and calculation types ## Decision Outcome From f6a868afae9acef4a5bdfaab0db65d624bc6a03e Mon Sep 17 00:00:00 2001 From: Alex Ganose Date: Mon, 6 Feb 2023 13:44:32 +0000 Subject: [PATCH 3/4] Add decision outcome for 002. --- decisions/0002-scope-of-emmet-models.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/decisions/0002-scope-of-emmet-models.md b/decisions/0002-scope-of-emmet-models.md index 0b3e248..9a4fb0e 100644 --- a/decisions/0002-scope-of-emmet-models.md +++ b/decisions/0002-scope-of-emmet-models.md @@ -78,7 +78,19 @@ it makes sense to port any of these into `emmet`. ## Decision Outcome -TBD +The decision was taken to proceed with option 2. `emmet` will be used to house +documents that are used in the Materials Project website and core workflows. +Documents that provide `from_files` functions will ensure that all imports are +performed inside the function itself to prevent dependency bloat. + +To ensure maximum compatibility with the existing set of parsed calculation +data, all field names that have been changed in `atomate2` will be updated to +match the field names in `emmet`. New fields such as `UUID` will be kept. + +It is expected that some task documents may start out in `atomate2` but +ultimately become part of the Materials Project infrastructure. When this +happens the documents will be moved to `emmet`. These decisions will be made on +a case-by-case basis. ## More Information From 6d0de264745b3a80046015060ba66a7e2042b57a Mon Sep 17 00:00:00 2001 From: Alex Ganose Date: Mon, 6 Feb 2023 16:58:41 +0000 Subject: [PATCH 4/4] Update 0002-scope-of-emmet-models.md --- decisions/0002-scope-of-emmet-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/decisions/0002-scope-of-emmet-models.md b/decisions/0002-scope-of-emmet-models.md index 9a4fb0e..c1bc7a6 100644 --- a/decisions/0002-scope-of-emmet-models.md +++ b/decisions/0002-scope-of-emmet-models.md @@ -78,7 +78,7 @@ it makes sense to port any of these into `emmet`. ## Decision Outcome -The decision was taken to proceed with option 2. `emmet` will be used to house +The decision was taken to proceed with option 1. `emmet` will be used to house documents that are used in the Materials Project website and core workflows. Documents that provide `from_files` functions will ensure that all imports are performed inside the function itself to prevent dependency bloat.