Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year #730

Merged
merged 18 commits into from
Dec 10, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Dec 7, 2024

Summary


Signature change for zopen


Forbid implicit binary/text mode for zopen

  • Emit warning when mode is not set, or b/t is not given in mode

Rationale:
Using implicit binary/text mode (no b/t in mode) with zopen would cause different behaviour for text file (file.txt, as TextIO) vs compressed file (file.gz, as BinaryIO), as default mode for open is "text" while default mode for compressed open is binary, as such we would get different data types from the stream (string vs bytes).

A read example:

with zopen("hello.txt") as file:
    print(file.readline())  # "hello" as string

with zopen("hello.txt.gz") as file:
    print((first_line := file.readline()))  # b'hello\n' as bytes
    print("hello" in first_line)  # TypeError: a bytes-like object is required, not 'str'

A write example:

with zopen("file.gz", mode="w") as file:
    file.write("hello")  # TypeError: memoryview: a bytes-like object is required, not 'str'

Default "UTF-8" encoding in text mode

Rationale see materialsproject/pymatgen#4218 (comment)


Drop support for .z extension after one-year grace period

I don't think Python's built-in gzip could (de)compress a real LZW-algorithm file (not a file (de)compressed by gzip DEFLATE algorithm and named to .z):

The gzip file format is also not to be confused with that of the compress utility, based on LZW, with extension .Z

method is the compression algorithm. Currently, the only supported value is DEFLATED.

For example:

import gzip
import os
import subprocess


with open("hello.txt", mode="wt", encoding="utf-8") as file:
    file.write("hello\n")

# On Ubuntu: `sudo apt install ncompress`, see: https://vapier.github.io/ncompress/
subprocess.run(["compress", "-f", "hello.txt"], check=True)
assert os.path.isfile("hello.txt.Z")

with gzip.open("hello.txt.Z", mode="rt") as file:
    print(file.readlines())  # gzip.BadGzipFile: Not a gzipped file (b'\x1f\x9d')

However user might have already used something like with zopen("text.z", mode="wt") as file: to generate such a file (looks like LZW but actually DEFLATE), and as such we wouldn't drop .z extension directly here, a grace period of one-year would be added.

Copy link

coderabbitai bot commented Dec 7, 2024

Walkthrough

The pull request introduces significant enhancements to file handling and encoding practices across multiple modules in the codebase. Key changes include the introduction of an EncodingWarning class, updates to the zopen function to enforce stricter parameter requirements, and the addition of UTF-8 encoding specifications in various file operations. The modifications aim to improve compatibility with different character sets and enhance error handling, particularly in serialization and deserialization processes. Additionally, several test cases have been updated to reflect these changes.

Changes

File Change Summary
src/monty/io.py Updated zopen function to require explicit mode parameter and added EncodingWarning class. Revised docstring and enhanced error handling for file types.
src/monty/json.py Added encoding="utf-8" to file operations in _load_redirect, save, and _d_from_path. Modified _check_type for better type handling.
src/monty/re.py Updated regrep function to include encoding="utf-8" when calling zopen with reverse flag set to False.
src/monty/serialization.py Added TextIO import and specified encoding="utf-8" in loadfn and dumpfn functions.
tasks.py Updated file operations to include encoding="utf-8" for markdown and pyproject.toml files in various tasks.
tests/test_io.py Restructured tests in TestZopen, added parameterized tests for file handling, and included encoding="utf-8" in multiple test methods.
tests/test_os.py Enhanced specificity in test_makedirs_p to check for error message "exists" when directory creation fails.
tests/test_serialization.py Added encoding="utf-8" to file operations in test_mpk method.
tests/test_shutil.py Updated multiple test methods to specify encoding="utf-8" for file operations.
tests/test_tempfile.py Modified several test methods to include encoding="utf-8" for file opening operations.
tests/test_files/myfile Deleted file containing a simple text line "HelloWorld."
tests/test_files/myfile_txt Completely removed file with the text "HelloWorld."
tests/test_files/real_lzw_file.txt.Z New binary file added, likely containing compressed data using the LZW algorithm.

Assessment against linked issues

Objective Addressed Explanation
Correctly display the symbol of the \Gamma point (#4214) No changes related to band structure display.

Possibly related PRs

🐰 In the land of code where rabbits play,
UTF-8's magic brightens the day.
With zopen's rules now clear and bright,
Encoding warnings take flight!
Let's hop through files, so neat and fine,
In our coding burrow, all will align! 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Dec 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.71%. Comparing base (a3d35a6) to head (eb0a4db).
Report is 28 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #730      +/-   ##
==========================================
+ Coverage   82.57%   82.71%   +0.14%     
==========================================
  Files          27       27              
  Lines        1584     1597      +13     
  Branches      284      288       +4     
==========================================
+ Hits         1308     1321      +13     
  Misses        215      215              
  Partials       61       61              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanielYang59 DanielYang59 changed the title zopen explicitly use UTF-8 encoding in text mode zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode Dec 7, 2024
@DanielYang59 DanielYang59 changed the title zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode Dec 7, 2024
@DanielYang59 DanielYang59 changed the title zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode zopen changes: forbit implicit binary/text mode, signature change, default UTF-8 encoding in text mode Dec 8, 2024
@DanielYang59 DanielYang59 changed the title zopen changes: forbit implicit binary/text mode, signature change, default UTF-8 encoding in text mode zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode Dec 8, 2024
src/monty/io.py Outdated Show resolved Hide resolved
@DanielYang59 DanielYang59 changed the title zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year Dec 8, 2024
@DanielYang59

This comment was marked as outdated.

@DanielYang59 DanielYang59 marked this pull request as ready for review December 8, 2024 14:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
src/monty/io.py (1)

28-33: Consider making mode a required argument to enforce explicit usage

Currently, mode has a default value of None, and a warning is issued when it's not specified. To encourage explicit specification and prevent future breaking changes, consider removing the default value and making mode a required argument. This enforces best practices now and aligns with the planned deprecation of default mode.

tasks.py (1)

Line range hint 141-146: Consider error handling for pyproject.toml operations

While the UTF-8 encoding is correct, consider adding error handling for file operations.

-    with open("pyproject.toml", encoding="utf-8") as f:
-        contents = f.read()
-        contents = re.sub(r"version = ([\.\d\"]+)", f'version = "{version}"', contents)
-
-    with open("pyproject.toml", "w", encoding="utf-8") as f:
-        f.write(contents)
+    try:
+        with open("pyproject.toml", encoding="utf-8") as f:
+            contents = f.read()
+            contents = re.sub(r"version = ([\.\d\"]+)", f'version = "{version}"', contents)
+
+        with open("pyproject.toml", "w", encoding="utf-8") as f:
+            f.write(contents)
+    except IOError as e:
+        raise RuntimeError(f"Failed to update version in pyproject.toml: {e}") from e
tests/test_tempfile.py (1)

22-22: Add tests for non-ASCII content

While the UTF-8 encoding addition is good, the tests only write ASCII content ("write"). Consider adding tests with non-ASCII content to verify encoding handling.

+    def test_with_non_ascii_content(self):
+        test_content = "Hello, 世界! 🌍"
+        with ScratchDir(self.scratch_root) as d:
+            with open("unicode_test.txt", "w", encoding="utf-8") as f:
+                f.write(test_content)
+            with open("unicode_test.txt", "r", encoding="utf-8") as f:
+                content = f.read()
+            assert content == test_content

Also applies to: 30-30, 52-52, 63-63, 77-77, 86-86, 112-112, 131-131

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a3d35a6 and b4f1a6b.

⛔ Files ignored due to path filters (7)
  • tests/test_files/myfile.gz is excluded by !**/*.gz
  • tests/test_files/myfile_bz2.bz2.gz is excluded by !**/*.gz
  • tests/test_files/myfile_gz.gz is excluded by !**/*.gz
  • tests/test_files/myfile_lzma.lzma.gz is excluded by !**/*.gz
  • tests/test_files/myfile_txt.gz is excluded by !**/*.gz
  • tests/test_files/myfile_xz.xz is excluded by !**/*.xz
  • tests/test_files/myfile_xz.xz.gz is excluded by !**/*.gz
📒 Files selected for processing (12)
  • src/monty/io.py (1 hunks)
  • src/monty/json.py (3 hunks)
  • src/monty/re.py (1 hunks)
  • src/monty/serialization.py (4 hunks)
  • tasks.py (3 hunks)
  • tests/test_files/myfile (0 hunks)
  • tests/test_files/myfile_txt (0 hunks)
  • tests/test_io.py (8 hunks)
  • tests/test_os.py (1 hunks)
  • tests/test_serialization.py (1 hunks)
  • tests/test_shutil.py (9 hunks)
  • tests/test_tempfile.py (7 hunks)
💤 Files with no reviewable changes (2)
  • tests/test_files/myfile
  • tests/test_files/myfile_txt
🔇 Additional comments (14)
src/monty/re.py (1)

52-52: Explicitly specifying UTF-8 encoding ensures consistent text handling

Adding encoding="utf-8" when opening the file in text mode with zopen promotes consistent behavior across different environments and prevents potential decoding errors with non-ASCII characters.

tests/test_serialization.py (1)

85-85: Adding explicit UTF-8 encoding enhances test robustness

Specifying encoding="utf-8" when opening the JSON file ensures the test behaves consistently, especially in environments with different default encodings, and properly handles any Unicode characters.

tasks.py (2)

70-72: LGTM: Proper encoding for index.md modifications

The UTF-8 encoding ensures proper handling of the table of contents and layout configurations.


42-42: LGTM: Consistent UTF-8 encoding for documentation files

The addition of UTF-8 encoding for documentation files is appropriate, especially for handling special characters in API documentation.

Let's verify if there are any special characters in the documentation that would benefit from this change:

Also applies to: 62-62

✅ Verification successful

Adding UTF-8 encoding is necessary and beneficial

The verification confirms that many documentation files contain non-ASCII characters (smart quotes in license text and special characters in documentation). Using UTF-8 encoding is the correct approach to handle these characters properly.

  • Found non-ASCII characters in multiple documentation files including docs/monty.termcolor.md
  • The change ensures consistent and correct handling of special characters across all documentation files
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for non-ASCII characters in documentation files
rg -l "[^\x00-\x7F]" "docs/*.md"

Length of output: 84


Script:

#!/bin/bash
# First, let's find the correct documentation files location
fd -t f ".md$"

# Then check for any non-ASCII characters in these files
fd -t f ".md$" --exec rg -l "[^\x00-\x7F]" {}

Length of output: 1315


Script:

#!/bin/bash
# Let's check what kind of non-ASCII characters are present in one of these files
# to confirm they're actually documentation content
rg --no-line-number "[^\x00-\x7F]" "./docs/monty.termcolor.md" -C 2

Length of output: 713

src/monty/serialization.py (2)

70-70: LGTM: Consistent encoding for file loading

The addition of UTF-8 encoding for text file loading is appropriate and aligns with the project's encoding standardization goals.


123-125: Verify type casting necessity

The type casting of fp to TextIO might be redundant as zopen with "wt" mode should already return a text file object.

Let's check the zopen implementation:

✅ Verification successful

The zopen implementation shows that it returns a union type IO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile and in its docstring it specifies the return type as TextIO | BinaryIO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile. Since the function can return different types based on the file extension, and the type system might not be able to narrow down the specific return type when used with "wt" mode, the type casting to TextIO is actually necessary for type safety.

Type casting to TextIO is necessary

The explicit cast helps the type checker understand that we're dealing with a text file handle, as zopen can return multiple different file types and the type system might not narrow it down automatically based on the "wt" mode.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check zopen's return type
ast-grep --pattern 'def zopen($$$) -> $_: $$$'

Length of output: 4910

tests/test_shutil.py (3)

29-31: LGTM: Explicit UTF-8 encoding added consistently.

The changes properly specify UTF-8 encoding for all file operations in the TestCopyR class, aligning with the PR's objective to enforce explicit encoding.

Also applies to: 34-36, 56-56


73-73: LGTM: UTF-8 encoding consistently applied to compression tests.

The changes properly specify UTF-8 encoding across all file operations in the TestCompressFileDir class, maintaining consistency with the PR's encoding requirements.

Also applies to: 88-88, 120-120


130-132: LGTM: Comprehensive UTF-8 encoding coverage in gzip tests.

The changes properly specify UTF-8 encoding for all file operations in the TestGzipDir class, including both read and write operations, ensuring consistent encoding behavior.

Also applies to: 155-155, 166-166, 176-176

tests/test_io.py (4)

32-32: LGTM: UTF-8 encoding added to reverse readline tests.

The changes properly specify UTF-8 encoding for all file operations in the TestReverseReadline class, ensuring consistent encoding behavior.

Also applies to: 42-42, 64-64


78-80: LGTM: Proper handling of line endings with UTF-8 encoding.

The changes correctly implement UTF-8 encoding while maintaining proper line ending handling for both Unix/Mac and Windows environments.

Also applies to: 83-83, 93-98, 101-101


181-204: LGTM: Comprehensive test coverage for file operations.

The new parameterized test method provides thorough coverage of:

  • Multiple file extensions (.txt, .bz2, .gz, .xz, .lzma)
  • Both text and binary modes
  • Proper UTF-8 encoding specification
  • Read and write operations

227-263: LGTM: Comprehensive warning behavior testing.

The new test method thoroughly verifies the warning system for:

  • Default UTF-8 encoding behavior
  • Implicit text/binary mode usage
  • Default mode usage
  • Proper handling across different file extensions
src/monty/json.py (1)

449-449: LGTM: UTF-8 encoding added to JSON file operations.

The changes properly specify UTF-8 encoding for JSON file read/write operations, ensuring consistent encoding behavior for serialization and deserialization.

Also applies to: 502-502

src/monty/io.py Outdated Show resolved Hide resolved
src/monty/io.py Outdated Show resolved Hide resolved
tests/test_os.py Show resolved Hide resolved
@DanielYang59 DanielYang59 marked this pull request as draft December 8, 2024 14:58
@DanielYang59 DanielYang59 marked this pull request as ready for review December 8, 2024 15:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/monty/io.py (3)

25-25: Consider adding version-specific implementation.

The current implementation backports EncodingWarning from Python 3.10, but it might be better to use the built-in class when available.

Consider this implementation:

-class EncodingWarning(Warning): ...  # Added in Python 3.10
+import sys
+if sys.version_info >= (3, 10):
+    from warnings import EncodingWarning
+else:
+    class EncodingWarning(Warning): ...  # Backport from Python 3.10

28-33: Consider using more specific return type hint.

The current return type hint uses a union of generic IO with specific file types. For better type safety, consider using Union[BinaryIO, TextIO] instead of generic IO.

-) -> IO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile:
+) -> Union[BinaryIO, TextIO, bz2.BZ2File, gzip.GzipFile, lzma.LZMAFile]:

58-90: Consider centralizing deadline management.

The deadline is currently hardcoded in the function. Consider moving it to a module-level constant for easier maintenance and consistency.

+# At module level
+IMPLICIT_MODE_DEADLINE = "2025-06-01"

-    _deadline: str = "2025-06-01"
+    _deadline: str = IMPLICIT_MODE_DEADLINE
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b4f1a6b and 53c63f6.

📒 Files selected for processing (1)
  • src/monty/io.py (1 hunks)
🔇 Additional comments (2)
src/monty/io.py (2)

99-108: ⚠️ Potential issue

Implement proper handling for .z files.

As discussed in previous reviews, the current implementation doesn't properly handle .z files, which could lead to incorrect behavior since Python's gzip module cannot handle LZW compression.

Consider raising an exception instead:

 if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
+    raise NotImplementedError(
+        "Support for '.z' files is not implemented. "
+        "Python's gzip module cannot decompress LZW-compressed '.z' files. "
+        "Please use an alternative method to handle '.z' files or rename to .gz "
+        "if the file is actually gzip compressed."
+    )

83-90: Verify impact of UTF-8 default encoding.

The change to use UTF-8 as default encoding aligns with PEP 686, but we should verify this doesn't break existing code that might rely on system default encoding.

✅ Verification successful

Let me try a different approach to gather the necessary information.


Based on the search results, I can now generate the final response:

UTF-8 default encoding change is safe to proceed

The codebase already consistently uses UTF-8 encoding across all file operations:

  • All explicit file operations in tests use encoding="utf-8"
  • Text mode operations through zopen() in src/monty/re.py and src/monty/serialization.py already specify UTF-8
  • No instances found of file operations relying on system default encoding
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for potential encoding-sensitive file operations
rg -l "encoding.*=(?!.*utf)" . | while read -r file; do
  echo "=== $file ==="
  rg "encoding.*=(?!.*utf)" "$file" -A 2
done

# Search for potential binary file operations that might be affected
ast-grep --pattern 'zopen($_, "r")' 

Length of output: 384


Script:

#!/bin/bash
# Search for file operations with explicit encoding
rg -A 2 'encoding\s*=' 

# Search for open operations in text mode
ast-grep --pattern 'open($_, "r")'
ast-grep --pattern 'open($_, "rt")'

# Search for potential file read operations
rg -A 2 '\.read\(\)' 

Length of output: 11034

@DanielYang59 DanielYang59 marked this pull request as draft December 9, 2024 08:14
@DanielYang59 DanielYang59 marked this pull request as ready for review December 10, 2024 10:37
category=EncodingWarning,
stacklevel=2,
)
kwargs["encoding"] = "utf-8"

_name, ext = os.path.splitext(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess splittext got updated at some point? It works directly with str as path now

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/monty/io.py (3)

25-25: Consider enhancing the EncodingWarning documentation

While the implementation is correct, adding a docstring would help clarify its purpose and usage, especially since it appears to be a backport of Python 3.10's EncodingWarning.

-class EncodingWarning(Warning): ...  # Added in Python 3.10
+class EncodingWarning(Warning):
+    """Warning category for encoding-related warnings.
+
+    This is a backport of Python 3.10's EncodingWarning to maintain
+    compatibility with earlier Python versions.
+    """
+    ...

58-58: Consider using a datetime object for the deadline

The deadline is currently a hardcoded string. Consider using a datetime object for better validation and comparison capabilities.

-_deadline: str = "2025-06-01"
+from datetime import datetime
+_deadline: datetime = datetime(2025, 6, 1)

95-110: Consider using a dictionary for extension mapping

The current implementation uses multiple if statements for extension mapping. Consider using a dictionary for better maintainability and performance.

+_COMPRESSION_HANDLERS = {
+    ".bz2": bz2.open,
+    ".gz": gzip.open,
+    ".xz": lzma.open,
+    ".lzma": lzma.open,
+}

-if ext == ".bz2":
-    return bz2.open(filename, mode, **kwargs)
-if ext == ".gz":
-    return gzip.open(filename, mode, **kwargs)
-if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
-if ext in {".xz", ".lzma"}:
-    return lzma.open(filename, mode, **kwargs)
+handler = _COMPRESSION_HANDLERS.get(ext)
+if handler:
+    return handler(filename, mode, **kwargs)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 53c63f6 and eb0a4db.

📒 Files selected for processing (3)
  • src/monty/io.py (1 hunks)
  • tests/test_files/real_lzw_file.txt.Z (1 hunks)
  • tests/test_io.py (8 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/test_files/real_lzw_file.txt.Z
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_io.py
🔇 Additional comments (4)
src/monty/io.py (4)

17-22: LGTM: Well-organized imports with proper type hints

The new imports are properly organized and the type hints are correctly conditioned with TYPE_CHECKING for runtime optimization.


28-56: LGTM: Well-designed function signature with clear documentation

The updated signature with positional-only parameter and comprehensive type hints improves clarity. The docstring effectively communicates the upcoming breaking changes and usage requirements.


82-90: LGTM: Proper implementation of UTF-8 default encoding

The implementation correctly follows PEP 686 by defaulting to UTF-8 encoding and warning users about implicit encoding usage.


99-108: ⚠️ Potential issue

Consider immediate deprecation of .z extension

Since Python's gzip module cannot properly handle LZW-compressed .z files, continuing to support them might lead to silent corruption or incorrect behavior. Consider raising a NotImplementedError instead of just warning.

 if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
+    raise NotImplementedError(
+        "Support for '.z' files is not implemented as Python's gzip module "
+        "cannot handle LZW-compressed files. Please use '.gz' extension instead."
+    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For band structure, how to correctly display the symbol of the \Gamma point
2 participants