Remove enum_values keywords in serialization #565

FabiPi3 · 2024-03-13T12:25:26Z

Instead of storing only the value of an Enum in the output, I suggest to store the whole enum class in order to be able to recreate it properly. This will be possible due to the new enum support in monty and after merging materialsvirtuallab/monty#640 this.

Auto-update pre-commit hooks

utf · 2024-03-13T12:29:07Z

I'm concerned this feature might cause an issue for downstream packages. Often we use Enums as keys for dictionaries. This feature in monty seems like it will break this behaviour since dictionaries cannot be hashed and therefore cannot be used as keys. Have you tried running the atomate2 tests with this new branch of jobflow?

FabiPi3 · 2024-03-13T13:28:47Z

@utf Thanks for your reply. I tried running the atomate2 tests locally and I can't see any issues arising when changing to my latest changes in monty and jobflow

utf · 2024-03-13T17:12:32Z

Thanks for checking @FabiPi3

codecov · 2024-03-13T17:14:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.42%. Comparing base (eda2a65) to head (978e66f).
Report is 14 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #565   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          21       21           
  Lines        1564     1564           
  Branches      425      425           
=======================================
  Hits         1555     1555           
  Misses          9        9

Files	Coverage Δ
src/jobflow/core/job.py	`100.00% <100.00%> (ø)`
src/jobflow/core/reference.py	`100.00% <100.00%> (ø)`

FabiPi3 · 2024-03-13T18:42:56Z

In general, I would guess that your described use case is still covered. Looking into the actual implementation of jsanitize, we see:

    if isinstance(obj, dict):
        return {
            str(k): jsanitize(
                v,
                strict=strict,
                allow_bson=allow_bson,
                enum_values=enum_values,
                recursive_msonable=recursive_msonable,
            )
            for k, v in obj.items()
        }

which means for me that only the values are serialized, the keys are still taken as strings. Hope this helps.

gpetretto · 2024-03-15T11:00:33Z

Hi @FabiPi3, I am also thinking that there might some consequences to this change. To illustrate this I made an example with different objects and (de)serialization, including pydantic base models, that are often present in jobflow's output.

Here is the code:

from monty.json import MSONable, jsanitize
from enum import Enum
from pydantic import BaseModel

class E(Enum):
    A = "A"

class ME(MSONable, Enum):
    B = "B"

class BM1(BaseModel):
    e: E = E.A
    
class BM2(BaseModel):
    me: ME = ME.B


d = {"e": E.A, "me": ME.B}

print("Dictionary")
for obj in [{"e": E.A}, {"me": ME.B}]:
    for enum_values in (True, False):
        try:
            print(f"{obj}, {enum_values}:", jsanitize(obj, strict=True, allow_bson=True, enum_values=enum_values))
        except AttributeError:
            print(f"{obj}, {enum_values}: failed")


print("\npydantic")
for cls in [BM1, BM2]:
    for enum_values in (True, False):
        try:
            print(f"{cls}, {enum_values}:", jsanitize(cls(), strict=True, allow_bson=True, enum_values=enum_values))
            try:
                print(cls.model_validate(jsanitize(cls(), strict=True, allow_bson=True, enum_values=enum_values)))
            except Exception as e:
                print(f"{obj}, {enum_values}: pydantic failed: {e}")    
        except AttributeError:
            print(f"{obj}, {enum_values}: jsanitize failed")

Here is the output with the standard monty version:

Dictionary
{'e': <E.A: 'A'>}, True: {'e': 'A'}
{'e': <E.A: 'A'>}, False: failed
{'me': <ME.B: 'B'>}, True: {'me': 'B'}
{'me': <ME.B: 'B'>}, False: {'me': {'@module': '__main__', '@class': 'ME', '@version': None, 'value': 'B'}}

pydantic
<class '__main__.BM1'>, True: {'e': 'A', '@module': '__main__', '@class': 'BM1', '@version': None}
e=<E.A: 'A'>
{'me': <ME.B: 'B'>}, False: jsanitize failed
<class '__main__.BM2'>, True: {'me': 'B', '@module': '__main__', '@class': 'BM2', '@version': None}
{'me': <ME.B: 'B'>}, True: pydantic failed: 1 validation error for BM2
me
  Value error, Must provide ME, the as_dict form, or the proper [type=value_error, input_value='B', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error
<class '__main__.BM2'>, False: {'me': {'@module': '__main__', '@class': 'ME', '@version': None, 'value': 'B'}, '@module': '__main__', '@class': 'BM2', '@version': None}
me=<ME.B: 'B'>

Here is the output of your new monty version:

Dictionary
{'e': <E.A: 'A'>}, True: {'e': 'A'}
{'e': <E.A: 'A'>}, False: {'e': {'value': 'A', '@module': '__main__', '@class': 'E', '@version': None}}
{'me': <ME.B: 'B'>}, True: {'me': 'B'}
{'me': <ME.B: 'B'>}, False: {'me': {'@module': '__main__', '@class': 'ME', '@version': None, 'value': 'B'}}

pydantic
<class '__main__.BM1'>, True: {'e': 'A', '@module': '__main__', '@class': 'BM1', '@version': None}
e=<E.A: 'A'>
<class '__main__.BM1'>, False: {'e': {'value': 'A', '@module': '__main__', '@class': 'E', '@version': None}, '@module': '__main__', '@class': 'BM1', '@version': None}
{'me': <ME.B: 'B'>}, False: pydantic failed: 1 validation error for BM1
e
  Input should be 'A' [type=enum, input_value={'value': 'A', '@module':...: 'E', '@version': None}, input_type=dict]
<class '__main__.BM2'>, True: {'me': 'B', '@module': '__main__', '@class': 'BM2', '@version': None}
{'me': <ME.B: 'B'>}, True: pydantic failed: 1 validation error for BM2
me
  Value error, Must provide ME, the as_dict form, or the proper [type=value_error, input_value='B', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error
<class '__main__.BM2'>, False: {'me': {'@module': '__main__', '@class': 'ME', '@version': None, 'value': 'B'}, '@module': '__main__', '@class': 'BM2', '@version': None}
me=<ME.B: 'B'>

and just to hopefully make it clearer, here is the diff of the two outputs:

3c3
< {'e': <E.A: 'A'>}, False: failed
---
> {'e': <E.A: 'A'>}, False: {'e': {'value': 'A', '@module': '__main__', '@class': 'E', '@version': None}}
10c10,13
< {'me': <ME.B: 'B'>}, False: jsanitize failed
---
> <class '__main__.BM1'>, False: {'e': {'value': 'A', '@module': '__main__', '@class': 'E', '@version': None}, '@module': '__main__', '@class': 'BM1', '@version': None}
> {'me': <ME.B: 'B'>}, False: pydantic failed: 1 validation error for BM1
> e
>   Input should be 'A' [type=enum, input_value={'value': 'A', '@module':...: 'E', '@version': None}, input_type=dict]

Considering that you propose to pass from jsanitize(strict=True, allow_bson=True, enum_values=True) to jsanitize(strict=True, allow_bson=True, enum_values=False), I can see some key differences

the jsanitize for {'e': E.A} that failed, now passes and is consistent with what the MontyEncoder does. Which is good.
The jsanitized document is also used for storing the output. This means that in all the cases we pass from a serialized version of the form {'e': 'A'} to one of the form {'e': {'value': 'A', '@module': '__main__', '@class': 'E', '@version': None}} in the output database. So, any mongodb query of the type {"e": "A"} would be broken and should be modified in {"e.value": "A"}. I am not sure how many such queries could be out there, but I suppose this could definitely break something.
In the case of BM1, going from enum_values=True to enum_values=False in the new version of monty also means that the pydantic model cannot be generated with model_validate from the serialized dictionary. In Jobflow MontyDecoder is used to deserialize, and thus keeps working correctly. So it is not strictly an issue for jobflow, but I am wondering if there are some cases where this could fail (there are in jobflow-remote, but I suppose can be fixed, if needed).

In general there is definitely a benefit to this, but some issues may arise.

I would also suggest to run the tests of emmet (and maybe maggma?) with the new monty version, to check that eveything works fine there as well.

FabiPi3 · 2024-03-19T09:50:58Z

@gpetretto Thanks for your detailed answer. I cannot really judge how large those issues would be, but in my opinion everything should be fixable.

Concerning the tests of emmet and maggma, everything that worked locally for me also worked with the new monty version. At least I couldn't see any issues arising there with the changes in monty. Hope this helps.

FabiPi3 · 2024-04-05T09:08:24Z

The pull request in monty was accepted. We could go ahead and merge this now. If I understand, there are no really big issues preventing this. Please comment again about your opinion.

Else I would close this (and adapt my application code). Thanks

FabiPi3 and others added 8 commits December 28, 2023 01:03

Auto-update pre-commit hooks

986fcb4

Merge branch 'main' into update/pre-commit-autoupdate

c4fa42c

Merge pull request #1 from FabiPi3/update/pre-commit-autoupdate

519948d

Auto-update pre-commit hooks

Merge branch 'materialsproject:main' into main

02d15a0

Merge branch 'materialsproject:main' into main

f8efa10

revert changes in pre-commit file

d03ae82

remove enum value keywords

e6e92d0

fix ruff

978e66f

FabiPi3 mentioned this pull request Mar 13, 2024

Fix jsanitize in contains_flow_or_job #559

Open

FabiPi3 mentioned this pull request Mar 14, 2024

Handle enums properly Matgenix/jobflow-remote#93

Closed

FabiPi3 closed this Jun 25, 2024

utf mentioned this pull request Sep 30, 2024

ValueEnum conflict between emmet and jobflow materialsproject/atomate2#1002

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove enum_values keywords in serialization #565

Remove enum_values keywords in serialization #565

FabiPi3 commented Mar 13, 2024

utf commented Mar 13, 2024

FabiPi3 commented Mar 13, 2024

utf commented Mar 13, 2024

codecov bot commented Mar 13, 2024

FabiPi3 commented Mar 13, 2024

gpetretto commented Mar 15, 2024

FabiPi3 commented Mar 19, 2024

FabiPi3 commented Apr 5, 2024

Remove enum_values keywords in serialization #565

Remove enum_values keywords in serialization #565

Conversation

FabiPi3 commented Mar 13, 2024

utf commented Mar 13, 2024

FabiPi3 commented Mar 13, 2024

utf commented Mar 13, 2024

codecov bot commented Mar 13, 2024

Codecov Report

FabiPi3 commented Mar 13, 2024

gpetretto commented Mar 15, 2024

FabiPi3 commented Mar 19, 2024

FabiPi3 commented Apr 5, 2024