Lazily import `torch/pydantic` in `json` module, speedup `from monty.json import` by 10x #713

DanielYang59 · 2024-10-20T02:45:19Z

Summary

Lazily import torch/pydantic for monty.json, thanks for @janosh's profiling in pymatviz import cost way too high janosh/pymatviz#209 (comment), would significantly speed up from monty.json import MSONable/MontyEncoder/MontyDecoder/...

Performance comparison

Following benchmark done with optional dependencies installed (otherwise import would be skipped), averaged across 10 runs
Python 3.12.7, MacOS 15.0.1 with M3 chip
Profile with: python -X importtime -c "from monty.json import MSONable" 2> monty.log && tuna monty.log

Benchmark Script (by GPT)

import time
import subprocess


def measure_import_time(command: str, repeats: int):
    total_time = 0

    for _ in range(repeats):
        start_time = time.time()

        # Run the Python command in a new process
        subprocess.run(["python", "-c", command], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        end_time = time.time()
        total_time += (end_time - start_time)

    avg_time_ms = (total_time / repeats) * 1000
    print(f"Average import time for '{command}': {avg_time_ms:.3f} ms")

def main():
    commands = [
        "from monty.json import MSONable",
        "from monty.json import MontyEncoder",
        "from monty.json import MontyDecoder",
    ]

    for command in commands:
        measure_import_time(command, repeats=10)

if __name__ == "__main__":
    main()

Import time from master branch

Average import time for 'from monty.json import MSONable': 924.585 ms
Average import time for 'from monty.json import MontyEncoder': 735.797 ms
Average import time for 'from monty.json import MontyDecoder': 720.402 ms

Import time traced for from monty.json import MSONable:

Import time from tip of this PR

Average import time for 'from monty.json import MSONable': 109.053 ms
Average import time for 'from monty.json import MontyEncoder': 111.683 ms
Average import time for 'from monty.json import MontyDecoder': 103.302 ms

After lazily import `torch`

Average import time for 'from monty.json import MSONable': 122.606 ms
Average import time for 'from monty.json import MontyEncoder': 123.456 ms
Average import time for 'from monty.json import MontyDecoder': 117.621 ms

After lazily import `numpy` as well (reverted, as use of np is very common)

Average import time for 'from monty.json import MSONable': 58.586 ms
Average import time for 'from monty.json import MontyEncoder': 59.323 ms
Average import time for 'from monty.json import MontyDecoder': 58.300 ms

After lazily import `pydantic` too (limited improvement from here)

Average import time for 'from monty.json import MSONable': 54.003 ms
Average import time for 'from monty.json import MontyEncoder': 53.038 ms
Average import time for 'from monty.json import MontyDecoder': 52.875 ms

After lazily import `ruamel.yaml` (reverted, no obvious improvement)

Average import time for 'from monty.json import MSONable': 52.769 ms
Average import time for 'from monty.json import MontyEncoder': 52.191 ms
Average import time for 'from monty.json import MontyDecoder': 60.379 ms

Summary by CodeRabbit

New Features
- Enhanced type safety and error handling in JSON processing.
Chores
- Updated GitHub Actions workflow for linting to use the latest Python setup action.
- Modified pre-commit hook configuration to run codespell during the pre-commit phase.
- Updated testing workflow to include Python 3.12 in the testing environment.

codecov · 2024-10-20T02:46:24Z

Codecov Report

Attention: Patch coverage is 78.04878% with 9 lines in your changes missing coverage. Please review.

Project coverage is 80.99%. Comparing base (1270c7b) to head (14c1e3e).

Files with missing lines	Patch %	Lines
src/monty/json.py	78.04%	7 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #713      +/-   ##
==========================================
- Coverage   81.02%   80.99%   -0.03%     
==========================================
  Files          27       27              
  Lines        1586     1589       +3     
  Branches      361      285      -76     
==========================================
+ Hits         1285     1287       +2     
- Misses        236      239       +3     
+ Partials       65       63       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/monty/json.py

DanielYang59 · 2024-10-20T06:27:56Z

src/monty/json.py

-        if torch is not None and isinstance(o, torch.Tensor):
-            # Support for Pytorch Tensors.
+        # Support for Pytorch Tensors
+        if _check_type(o, "torch.Tensor"):


Should be the correct type string:

import torch tensor = torch.randn(3, 3) print(type(tensor).mro())

Gives: [<class 'torch.Tensor'>, <class 'torch._C.TensorBase'>, <class 'object'>]

DanielYang59 · 2024-10-20T06:30:00Z

src/monty/json.py

@@ -660,7 +644,7 @@ def default(self, o) -> dict:
                raise AttributeError(e)

        try:
-            if pydantic is not None and isinstance(o, pydantic.BaseModel):
+            if _check_type(o, "pydantic.main.BaseModel"):


from pydantic import BaseModel class MyModel(BaseModel): name: str model_instance = MyModel(name="monty") print(type(model_instance).mro())

Gives: [<class '__main__.MyModel'>, <class 'pydantic.main.BaseModel'>, <class 'object'>]

shyuep · 2024-10-20T15:05:49Z

I don't think we need to lazy import numpy. Numpy is used everywhere. We need to keep in mind that the import cost will be incurred regardless. I am fine with lazy importing time, pydantic and torch since those are used much less commonly.

DanielYang59 · 2024-10-21T01:40:29Z

Cool thanks for the comment! Sure thing I would revert the part for numpy

Can you please review #714 which would unblock this PR (otherwise nesting those import blocks would make it really hard to see what optional dependency is required)? Really appreciate that

shyuep · 2024-10-21T03:22:03Z

I merged 714

DanielYang59 · 2024-10-21T03:22:56Z

Thank you again! Would clean up this PR now

coderabbitai · 2024-10-21T03:33:38Z

Walkthrough

The changes in the pull request focus on enhancing type hinting, error handling, and code structure in the src/monty/json.py file. Key modifications include the use of TYPE_CHECKING for conditional imports, localized error handling for optional imports of pydantic and torch, and updates to function signatures to improve type safety. The _check_type function has been made more flexible, and the MontyEncoder and MontyDecoder classes have been refactored to centralize type checking and improve robustness against missing dependencies. Additionally, updates were made to the GitHub Actions workflow for linting and the pre-commit hook configuration.

Changes

File	Change Summary
src/monty/json.py	- Updated `_load_redirect` to specify return type as `dict`.
	- Modified `_check_type` to accept a type string or a tuple of strings.
	- Refactored `_check_type` return statement for clarity using f-strings.
	- Adjusted `MontyEncoder.default` to use `_check_type` for type checks instead of direct checks.
	- Updated `MontyDecoder.decode` to handle optional imports with try-except blocks.
.github/workflows/lint.yml	- Updated `setup-python` action from version `v4` to `v5`.
.pre-commit-config.yaml	- Changed `codespell` hook's stages from `[commit, commit-msg]` to `[pre-commit, commit-msg]`.
.github/workflows/test.yml	- Updated `python-version` matrix from `["3.9", "3.x"]` to `["3.9", "3.12"]`.

Possibly related PRs

Add missing functools.wraps decorator to deprecated decorator and handle dataclass properly #699: This PR modifies the src/monty/json.py file to enhance type annotations and improve documentation, which aligns with the main PR's focus on type hinting and code structure improvements in the same file.

Poem

In the code where bunnies play,
Type hints hop and errors sway.
With checks so neat, and imports tight,
Our JSON's ready, shining bright!
A leap of joy, a code delight! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)

src/monty/json.py (4)
Line range hint 574-580: Replace deprecated o.type() with o.dtype in MontyEncoder

The use of o.type() is deprecated in recent versions of PyTorch. It's recommended to use o.dtype for compatibility with newer versions.

Apply the following changes to update the code:
 if _check_type(o, "torch.Tensor"):
-    d: dict[str, Any] = {
+    d = {
         "@module": "torch",
         "@class": "Tensor",
-        "dtype": o.type(),
+        "dtype": str(o.dtype),
     }
-    if "Complex" in o.type():
+    if "Complex" in str(o.dtype):
         d["data"] = [o.real.tolist(), o.imag.tolist()]
     else:
         d["data"] = o.numpy().tolist()
     return d
This ensures future compatibility by using o.dtype and converting it to a string where necessary.

925-925: Redundant line without code

Line 925 appears to be an empty or redundant line without any code changes.

Consider removing this line to maintain code cleanliness.

965-965: Unnecessary line without code change

Line 965 does not contain any code changes and can be removed for clarity.

Removing empty lines helps keep the code concise.

Line range hint 1005-1009: Improve handling of Pydantic BaseModel in jsanitize

The check for pydantic.main.BaseModel should handle cases where Pydantic may not be installed, preventing potential ImportError.

Update the code to safely check for Pydantic BaseModel:
+try:
     if _check_type(obj, "pydantic.main.BaseModel"):
         return jsanitize(
             MontyEncoder().default(obj),
             strict=strict,
             allow_bson=allow_bson,
             enum_values=enum_values,
             recursive_msonable=recursive_msonable,
         )
+except ImportError:
+    pass
This ensures that the code does not raise an exception if Pydantic is not available.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 189b6e6 and 226194b.

📒 Files selected for processing (1)

src/monty/json.py (17 hunks)

🧰 Additional context used

🔇 Additional comments (6)

src/monty/json.py (6)

27-28: Efficient use of TYPE_CHECKING to optimize imports

The use of TYPE_CHECKING to conditionally import Any ensures that the import occurs only during type checking, reducing runtime overhead.

44-44: Added return type annotation for _load_redirect

Including the return type -> dict enhances type checking and improves code readability.

55-55: Type annotation for redirect_dict enhances clarity

Specifying the type dict for redirect_dict aids in static analysis and clarifies the expected data structure.

71-71: Updated _check_type signature for flexibility

Allowing type_str to be either a str or tuple[str, ...] makes the function more versatile in handling multiple type strings.

105-105: Optimized type checking in _check_type

The use of a generator expression with any() efficiently checks if the object's type matches any in type_str.

322-326: Conditional import for pydantic_core with clear error handling

Importing pydantic_core within the method and providing a specific error message if it's not available ensures that the dependency is only required when needed, and users are informed appropriately.

src/monty/json.py

DanielYang59 · 2024-10-21T03:52:17Z

@shyuep ready for review, thanks

.pre-commit-config.yaml

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

.github/workflows/test.yml (1)
12-12: Approved: Python version matrix update enhances testing precision.

The change from ["3.9", "3.x"] to ["3.9", "3.12"] ensures testing on both an older stable version (3.9) and the latest Python version (3.12). This aligns well with the benchmarking environment mentioned in the PR objectives and will help catch any Python 3.12-specific issues early.

Consider adding an intermediate version (e.g., 3.11) to the matrix to catch potential issues with versions between 3.9 and 3.12:
python-version: ["3.9", "3.11", "3.12"]
This would provide more comprehensive coverage across Python versions without significantly increasing the testing time.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4a8377b and f33fd5a.

📒 Files selected for processing (1)

.github/workflows/test.yml (1 hunks)

🧰 Additional context used

.github/workflows/test.yml

avoid import torch for type check in MontyEncoder

a7752ee

lazily import torch for MontyDecoder

196dcf9

DanielYang59 changed the title ~~Lazily import torch for MontyEncoder and MontyDecoder~~ Lazily import torch for MontyEncoder and MontyDecoder Oct 20, 2024

also lazily import numpy

f389e24

DanielYang59 force-pushed the lazy-import-torch branch from 952eb94 to f389e24 Compare October 20, 2024 03:28

DanielYang59 added 2 commits October 20, 2024 11:55

lazily import pydantic

2bcc6cd

lazily load ruamel.yaml

acfeb00

DanielYang59 changed the title ~~Lazily import torch for MontyEncoder and MontyDecoder~~ Lazily import torch/numpy/pydantic/ruamel for MontyEncoder and MontyDecoder Oct 20, 2024

DanielYang59 commented Oct 20, 2024

View reviewed changes

src/monty/json.py Outdated Show resolved Hide resolved

DanielYang59 changed the title ~~Lazily import torch/numpy/pydantic/ruamel for MontyEncoder and MontyDecoder~~ Lazily import torch/numpy/pydantic for MontyEncoder and MontyDecoder Oct 20, 2024

DanielYang59 changed the title ~~Lazily import torch/numpy/pydantic for MontyEncoder and MontyDecoder~~ Lazily import torch/numpy/pydantic in json module Oct 20, 2024

revert lazy import ruamel, no obvious improvement

002f5af

DanielYang59 commented Oct 20, 2024

View reviewed changes

DanielYang59 changed the title ~~Lazily import torch/numpy/pydantic in json module~~ Lazily import torch/numpy/pydantic in json module, speedup from monty.json import by 10x Oct 20, 2024

DanielYang59 added 3 commits October 20, 2024 18:38

E1101 should disappear

f67796c

use f-string

69cc7d9

add type

14c1e3e

DanielYang59 mentioned this pull request Oct 20, 2024

import monty.json slowing down core import materialsproject/pymatgen#3793

Closed

1 task

eagerly import numpy

3b8d854

DanielYang59 changed the title ~~Lazily import torch/numpy/pydantic in json module, speedup from monty.json import by 10x~~ Lazily import torch/pydantic in json module, speedup from monty.json import by 10x Oct 21, 2024

DanielYang59 added 2 commits October 21, 2024 11:25

Merge remote-tracking branch 'upstream/master' into lazy-import-torch

c7f012d

push a random change to rerun CI, torch install is flaky as always

226194b

DanielYang59 marked this pull request as ready for review October 21, 2024 03:33

add comment about lazy import

bc58f69

coderabbitai bot reviewed Oct 21, 2024

View reviewed changes

src/monty/json.py Show resolved Hide resolved

src/monty/json.py Show resolved Hide resolved

src/monty/json.py Show resolved Hide resolved

src/monty/json.py Outdated Show resolved Hide resolved

bump setup python version

6f9580c

DanielYang59 force-pushed the lazy-import-torch branch from 6495fa3 to 6f9580c Compare October 21, 2024 04:00

pre-commit migrate-config

4a8377b

DanielYang59 commented Oct 21, 2024

View reviewed changes

.pre-commit-config.yaml Show resolved Hide resolved

pin 3.12

f33fd5a

coderabbitai bot reviewed Oct 21, 2024

View reviewed changes

DanielYang59 commented Oct 21, 2024

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

shyuep merged commit 1ff96e8 into materialsvirtuallab:master Oct 21, 2024
8 checks passed

DanielYang59 deleted the lazy-import-torch branch October 22, 2024 02:08

coderabbitai bot mentioned this pull request Oct 22, 2024

Fix line ending handling in reverse_readfile/readline across OS, and not skipping empty lines #712

Merged

3 tasks

DanielYang59 mentioned this pull request Oct 22, 2024

bump monty to use the monty.json import speedup patch, add import time regression test, lazy load some rarely used but costly modules materialsproject/pymatgen#4128

Merged

15 tasks

coderabbitai bot mentioned this pull request Oct 25, 2024

Add unit test for helper function json._check_type #716

Merged

coderabbitai bot mentioned this pull request Dec 8, 2024

zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year #730

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily import `torch/pydantic` in `json` module, speedup `from monty.json import` by 10x #713

Lazily import `torch/pydantic` in `json` module, speedup `from monty.json import` by 10x #713

DanielYang59 commented Oct 20, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Oct 20, 2024 •

edited

Loading

DanielYang59 Oct 20, 2024 •

edited

Loading

DanielYang59 Oct 20, 2024

shyuep commented Oct 20, 2024

DanielYang59 commented Oct 21, 2024 •

edited

Loading

shyuep commented Oct 21, 2024

DanielYang59 commented Oct 21, 2024

coderabbitai bot commented Oct 21, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

DanielYang59 commented Oct 21, 2024

coderabbitai bot left a comment

Lazily import torch/pydantic in json module, speedup from monty.json import by 10x #713

Lazily import torch/pydantic in json module, speedup from monty.json import by 10x #713

Conversation

DanielYang59 commented Oct 20, 2024 • edited by coderabbitai bot Loading

Summary

Performance comparison

Import time from master branch

Import time from tip of this PR

After lazily import torch

After lazily import numpy as well (reverted, as use of np is very common)

After lazily import pydantic too (limited improvement from here)

After lazily import ruamel.yaml (reverted, no obvious improvement)

Summary by CodeRabbit

codecov bot commented Oct 20, 2024 • edited Loading

Codecov Report

DanielYang59 Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

DanielYang59 Oct 20, 2024

Choose a reason for hiding this comment

shyuep commented Oct 20, 2024

DanielYang59 commented Oct 21, 2024 • edited Loading

shyuep commented Oct 21, 2024

DanielYang59 commented Oct 21, 2024

coderabbitai bot commented Oct 21, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

DanielYang59 commented Oct 21, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

Lazily import `torch/pydantic` in `json` module, speedup `from monty.json import` by 10x #713

Lazily import `torch/pydantic` in `json` module, speedup `from monty.json import` by 10x #713

DanielYang59 commented Oct 20, 2024 •

edited by coderabbitai bot

Loading

After lazily import `torch`

After lazily import `numpy` as well (reverted, as use of np is very common)

After lazily import `pydantic` too (limited improvement from here)

After lazily import `ruamel.yaml` (reverted, no obvious improvement)

codecov bot commented Oct 20, 2024 •

edited

Loading

DanielYang59 Oct 20, 2024 •

edited

Loading

DanielYang59 commented Oct 21, 2024 •

edited

Loading

coderabbitai bot commented Oct 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)