Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily import torch/pydantic in json module, speedup from monty.json import by 10x #713

Merged
merged 16 commits into from
Oct 21, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Oct 20, 2024

Summary


Performance comparison

  • Following benchmark done with optional dependencies installed (otherwise import would be skipped), averaged across 10 runs
  • Python 3.12.7, MacOS 15.0.1 with M3 chip
  • Profile with: python -X importtime -c "from monty.json import MSONable" 2> monty.log && tuna monty.log
Benchmark Script (by GPT)
import time
import subprocess


def measure_import_time(command: str, repeats: int):
    total_time = 0

    for _ in range(repeats):
        start_time = time.time()

        # Run the Python command in a new process
        subprocess.run(["python", "-c", command], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        end_time = time.time()
        total_time += (end_time - start_time)

    avg_time_ms = (total_time / repeats) * 1000
    print(f"Average import time for '{command}': {avg_time_ms:.3f} ms")

def main():
    commands = [
        "from monty.json import MSONable",
        "from monty.json import MontyEncoder",
        "from monty.json import MontyDecoder",
    ]

    for command in commands:
        measure_import_time(command, repeats=10)

if __name__ == "__main__":
    main()

Import time from master branch

Average import time for 'from monty.json import MSONable': 924.585 ms
Average import time for 'from monty.json import MontyEncoder': 735.797 ms
Average import time for 'from monty.json import MontyDecoder': 720.402 ms

Import time traced for from monty.json import MSONable:
image

Import time from tip of this PR

Average import time for 'from monty.json import MSONable': 109.053 ms
Average import time for 'from monty.json import MontyEncoder': 111.683 ms
Average import time for 'from monty.json import MontyDecoder': 103.302 ms
image

After lazily import torch

Average import time for 'from monty.json import MSONable': 122.606 ms
Average import time for 'from monty.json import MontyEncoder': 123.456 ms
Average import time for 'from monty.json import MontyDecoder': 117.621 ms
image

After lazily import numpy as well (reverted, as use of np is very common)

Average import time for 'from monty.json import MSONable': 58.586 ms
Average import time for 'from monty.json import MontyEncoder': 59.323 ms
Average import time for 'from monty.json import MontyDecoder': 58.300 ms
image

After lazily import pydantic too (limited improvement from here)

Average import time for 'from monty.json import MSONable': 54.003 ms
Average import time for 'from monty.json import MontyEncoder': 53.038 ms
Average import time for 'from monty.json import MontyDecoder': 52.875 ms
image

After lazily import ruamel.yaml (reverted, no obvious improvement)

Average import time for 'from monty.json import MSONable': 52.769 ms
Average import time for 'from monty.json import MontyEncoder': 52.191 ms
Average import time for 'from monty.json import MontyDecoder': 60.379 ms
image

Summary by CodeRabbit

  • New Features
    • Enhanced type safety and error handling in JSON processing.
  • Chores
    • Updated GitHub Actions workflow for linting to use the latest Python setup action.
    • Modified pre-commit hook configuration to run codespell during the pre-commit phase.
    • Updated testing workflow to include Python 3.12 in the testing environment.

Copy link

codecov bot commented Oct 20, 2024

Codecov Report

Attention: Patch coverage is 78.04878% with 9 lines in your changes missing coverage. Please review.

Project coverage is 80.99%. Comparing base (1270c7b) to head (14c1e3e).

Files with missing lines Patch % Lines
src/monty/json.py 78.04% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #713      +/-   ##
==========================================
- Coverage   81.02%   80.99%   -0.03%     
==========================================
  Files          27       27              
  Lines        1586     1589       +3     
  Branches      361      285      -76     
==========================================
+ Hits         1285     1287       +2     
- Misses        236      239       +3     
+ Partials       65       63       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanielYang59 DanielYang59 changed the title Lazily import torch for MontyEncoder and MontyDecoder Lazily import torch for MontyEncoder and MontyDecoder Oct 20, 2024
@DanielYang59 DanielYang59 changed the title Lazily import torch for MontyEncoder and MontyDecoder Lazily import torch/numpy/pydantic/ruamel for MontyEncoder and MontyDecoder Oct 20, 2024
src/monty/json.py Outdated Show resolved Hide resolved
@DanielYang59 DanielYang59 changed the title Lazily import torch/numpy/pydantic/ruamel for MontyEncoder and MontyDecoder Lazily import torch/numpy/pydantic for MontyEncoder and MontyDecoder Oct 20, 2024
@DanielYang59 DanielYang59 changed the title Lazily import torch/numpy/pydantic for MontyEncoder and MontyDecoder Lazily import torch/numpy/pydantic in json module Oct 20, 2024
if torch is not None and isinstance(o, torch.Tensor):
# Support for Pytorch Tensors.
# Support for Pytorch Tensors
if _check_type(o, "torch.Tensor"):
Copy link
Contributor Author

@DanielYang59 DanielYang59 Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the correct type string:

import torch


tensor = torch.randn(3, 3)
print(type(tensor).mro())

Gives: [<class 'torch.Tensor'>, <class 'torch._C.TensorBase'>, <class 'object'>]

@@ -660,7 +644,7 @@ def default(self, o) -> dict:
raise AttributeError(e)

try:
if pydantic is not None and isinstance(o, pydantic.BaseModel):
if _check_type(o, "pydantic.main.BaseModel"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pydantic import BaseModel


class MyModel(BaseModel):
    name: str

model_instance = MyModel(name="monty")
print(type(model_instance).mro())

Gives: [<class '__main__.MyModel'>, <class 'pydantic.main.BaseModel'>, <class 'object'>]

@DanielYang59 DanielYang59 changed the title Lazily import torch/numpy/pydantic in json module Lazily import torch/numpy/pydantic in json module, speedup from monty.json import by 10x Oct 20, 2024
@shyuep
Copy link
Contributor

shyuep commented Oct 20, 2024

I don't think we need to lazy import numpy. Numpy is used everywhere. We need to keep in mind that the import cost will be incurred regardless. I am fine with lazy importing time, pydantic and torch since those are used much less commonly.

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Oct 21, 2024

Cool thanks for the comment! Sure thing I would revert the part for numpy

Can you please review #714 which would unblock this PR (otherwise nesting those import blocks would make it really hard to see what optional dependency is required)? Really appreciate that

@DanielYang59 DanielYang59 changed the title Lazily import torch/numpy/pydantic in json module, speedup from monty.json import by 10x Lazily import torch/pydantic in json module, speedup from monty.json import by 10x Oct 21, 2024
@shyuep
Copy link
Contributor

shyuep commented Oct 21, 2024

I merged 714

@DanielYang59
Copy link
Contributor Author

Thank you again! Would clean up this PR now

@DanielYang59 DanielYang59 marked this pull request as ready for review October 21, 2024 03:33
Copy link

coderabbitai bot commented Oct 21, 2024

Walkthrough

The changes in the pull request focus on enhancing type hinting, error handling, and code structure in the src/monty/json.py file. Key modifications include the use of TYPE_CHECKING for conditional imports, localized error handling for optional imports of pydantic and torch, and updates to function signatures to improve type safety. The _check_type function has been made more flexible, and the MontyEncoder and MontyDecoder classes have been refactored to centralize type checking and improve robustness against missing dependencies. Additionally, updates were made to the GitHub Actions workflow for linting and the pre-commit hook configuration.

Changes

File Change Summary
src/monty/json.py - Updated _load_redirect to specify return type as dict.
- Modified _check_type to accept a type string or a tuple of strings.
- Refactored _check_type return statement for clarity using f-strings.
- Adjusted MontyEncoder.default to use _check_type for type checks instead of direct checks.
- Updated MontyDecoder.decode to handle optional imports with try-except blocks.
.github/workflows/lint.yml - Updated setup-python action from version v4 to v5.
.pre-commit-config.yaml - Changed codespell hook's stages from [commit, commit-msg] to [pre-commit, commit-msg].
.github/workflows/test.yml - Updated python-version matrix from ["3.9", "3.x"] to ["3.9", "3.12"].

Possibly related PRs

Poem

In the code where bunnies play,
Type hints hop and errors sway.
With checks so neat, and imports tight,
Our JSON's ready, shining bright!
A leap of joy, a code delight! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
src/monty/json.py (4)

Line range hint 574-580: Replace deprecated o.type() with o.dtype in MontyEncoder

The use of o.type() is deprecated in recent versions of PyTorch. It's recommended to use o.dtype for compatibility with newer versions.

Apply the following changes to update the code:

 if _check_type(o, "torch.Tensor"):
-    d: dict[str, Any] = {
+    d = {
         "@module": "torch",
         "@class": "Tensor",
-        "dtype": o.type(),
+        "dtype": str(o.dtype),
     }
-    if "Complex" in o.type():
+    if "Complex" in str(o.dtype):
         d["data"] = [o.real.tolist(), o.imag.tolist()]
     else:
         d["data"] = o.numpy().tolist()
     return d

This ensures future compatibility by using o.dtype and converting it to a string where necessary.


925-925: Redundant line without code

Line 925 appears to be an empty or redundant line without any code changes.

Consider removing this line to maintain code cleanliness.


965-965: Unnecessary line without code change

Line 965 does not contain any code changes and can be removed for clarity.

Removing empty lines helps keep the code concise.


Line range hint 1005-1009: Improve handling of Pydantic BaseModel in jsanitize

The check for pydantic.main.BaseModel should handle cases where Pydantic may not be installed, preventing potential ImportError.

Update the code to safely check for Pydantic BaseModel:

+try:
     if _check_type(obj, "pydantic.main.BaseModel"):
         return jsanitize(
             MontyEncoder().default(obj),
             strict=strict,
             allow_bson=allow_bson,
             enum_values=enum_values,
             recursive_msonable=recursive_msonable,
         )
+except ImportError:
+    pass

This ensures that the code does not raise an exception if Pydantic is not available.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 189b6e6 and 226194b.

📒 Files selected for processing (1)
  • src/monty/json.py (17 hunks)
🧰 Additional context used
🔇 Additional comments (6)
src/monty/json.py (6)

27-28: Efficient use of TYPE_CHECKING to optimize imports

The use of TYPE_CHECKING to conditionally import Any ensures that the import occurs only during type checking, reducing runtime overhead.


44-44: Added return type annotation for _load_redirect

Including the return type -> dict enhances type checking and improves code readability.


55-55: Type annotation for redirect_dict enhances clarity

Specifying the type dict for redirect_dict aids in static analysis and clarifies the expected data structure.


71-71: Updated _check_type signature for flexibility

Allowing type_str to be either a str or tuple[str, ...] makes the function more versatile in handling multiple type strings.


105-105: Optimized type checking in _check_type

The use of a generator expression with any() efficiently checks if the object's type matches any in type_str.


322-326: Conditional import for pydantic_core with clear error handling

Importing pydantic_core within the method and providing a specific error message if it's not available ensures that the dependency is only required when needed, and users are informed appropriately.

src/monty/json.py Show resolved Hide resolved
src/monty/json.py Show resolved Hide resolved
src/monty/json.py Show resolved Hide resolved
src/monty/json.py Outdated Show resolved Hide resolved
@DanielYang59
Copy link
Contributor Author

@shyuep ready for review, thanks

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
.github/workflows/test.yml (1)

12-12: Approved: Python version matrix update enhances testing precision.

The change from ["3.9", "3.x"] to ["3.9", "3.12"] ensures testing on both an older stable version (3.9) and the latest Python version (3.12). This aligns well with the benchmarking environment mentioned in the PR objectives and will help catch any Python 3.12-specific issues early.

Consider adding an intermediate version (e.g., 3.11) to the matrix to catch potential issues with versions between 3.9 and 3.12:

python-version: ["3.9", "3.11", "3.12"]

This would provide more comprehensive coverage across Python versions without significantly increasing the testing time.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4a8377b and f33fd5a.

📒 Files selected for processing (1)
  • .github/workflows/test.yml (1 hunks)
🧰 Additional context used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants