-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching parsed output files #273
Conversation
This generally looks good to me. |
Thanks for looking into this. |
@gpetretto Looks like this fell under the radar. If you could resolve the conflicts, we'll get this merged! |
Thanks for getting back on this @janosh. It should be fine now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about instead of a new io
module, we just define
ChachedOutcar = tracked_lru_cache(Outcar)
ChachedVasprun = tracked_lru_cache(Vasprun)
directly in custodian/vasp/handlers.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done, but since this is also used in the validators
module they should be imported from the handlers
. I think that having a generic io module could be fine, but I have no problem moving them to handlers.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess no need to move to handlers.py
.
@pytest.fixture(autouse=True) | ||
def _clear_tracked_cache(): | ||
""" | ||
Clear the cache of the stored functions between the tests. | ||
""" | ||
from custodian.utils import tracked_lru_cache | ||
|
||
tracked_lru_cache.tracked_cache_clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fixture currently runs on every test, not just the handler tests which would be unnecessary work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move it to custodian/vasp/handlers.py
but keep autouse=True
to only auto apply it to those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is also used in the validators, I think the fixture should either be duplicated in the handlers and validators files, or added it explicitly to the tests that need it, removing the autouse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably duplicating it in both files makes it more explicit too. People are more likely to see it and become aware this is happening.
@gpetretto Perhaps you could address my comments in a new PR? |
Hi @janosh , sorry for the delay. I can indeed adress these comments in a separate PR. Can you maybe clarify what would be your preferences in the comments above? |
Summary
Following up on #268 I wrote an initial implementation for caching the output files. It is based on a modified version of the
lru_cache
decorator. This should make it easy switch to helper function for storing different properties, while still allowingCustodian
to clear the cache after each time the loop on the handlers/validators has been executed.One potential drawback of the implementation is that the set of cached functions is stored as a class attribute of
tracked_lru_cache
. This means for example that in a multithreading execution (memory shared), with multiple instances of Custodian running, the cache will be cleared more frequently than expected. It should not lead to overlaps, due to the caching being based on the absolute paths. However, a multithreading execution does not seem very likely to be used with Cusodian. No problem will happen in a multiprocessing execution of multiple instances of Custodian (no memory shared).Making the set of cached function a property of a single Custodian instance is also feasible, but it will need to be passed to the handlers so that the decorated functions can be tracked, making the implementation of the cached functions a bit more involved.
I would like to first have a feedback on this solution. If it is fine I will proceed to write specific tests for the new functionality.
If there is also an interest for switching to the parsing of specific values, instead of the full vasprun.xml/OUTCAR, I will do that in a separate PR.