`zopen` changes: forbid implicit binary/text `mode`, signature change, default UTF-8 `encoding` in text mode, drop `.z` support after one-year #730

DanielYang59 · 2024-12-07T08:07:35Z

Summary

Use UTF-8 encoding as default in text mode when encoding is not given, follow up Explicit UTF-8 encoding for VASP input files with zopen, and open for other text files materialsproject/pymatgen#4218, to fix For band structure, how to correctly display the symbol of the \Gamma point materialsproject/pymatgen#4214
Forbid implicit "binary/text" mode, the plan is giving a half-year grace time (I prefer to enforce a shorter grace time as this is not deprecation but more a fix for a dangerous behaviour), after that passing a mode without b/t would directly trigger an exception
zopen signature change
Test enhance
Test downstream pymatgen: commit, CI log

Signature change for `zopen`

Force file/filename as positional, AFAIK, there is no usage as with zopen(filename=) (i.e. filename as kwarg).
- Rationale: Different file var names in open (file) and gzip/bz2/lzma.open (filename)
Make arguments other than filename/mode keyword only.
- Rationale: Completely different signature after filename and mode, passing as positional args would give very unpredictable results:

Forbid implicit binary/text mode for `zopen`

Emit warning when mode is not set, or b/t is not given in mode

Rationale:
Using implicit binary/text mode (no b/t in mode) with zopen would cause different behaviour for text file (file.txt, as TextIO) vs compressed file (file.gz, as BinaryIO), as default mode for open is "text" while default mode for compressed open is binary, as such we would get different data types from the stream (string vs bytes).

A read example:

with zopen("hello.txt") as file:
    print(file.readline())  # "hello" as string

with zopen("hello.txt.gz") as file:
    print((first_line := file.readline()))  # b'hello\n' as bytes
    print("hello" in first_line)  # TypeError: a bytes-like object is required, not 'str'

A write example:

with zopen("file.gz", mode="w") as file:
    file.write("hello")  # TypeError: memoryview: a bytes-like object is required, not 'str'

Default "UTF-8" `encoding` in text mode

Rationale see materialsproject/pymatgen#4218 (comment)

Drop support for `.z` extension after one-year grace period

I don't think Python's built-in gzip could (de)compress a real LZW-algorithm file (not a file (de)compressed by gzip DEFLATE algorithm and named to .z):

The gzip file format is also not to be confused with that of the compress utility, based on LZW, with extension .Z

method is the compression algorithm. Currently, the only supported value is DEFLATED.

For example:

import gzip
import os
import subprocess


with open("hello.txt", mode="wt", encoding="utf-8") as file:
    file.write("hello\n")

# On Ubuntu: `sudo apt install ncompress`, see: https://vapier.github.io/ncompress/
subprocess.run(["compress", "-f", "hello.txt"], check=True)
assert os.path.isfile("hello.txt.Z")

with gzip.open("hello.txt.Z", mode="rt") as file:
    print(file.readlines())  # gzip.BadGzipFile: Not a gzipped file (b'\x1f\x9d')

However user might have already used something like with zopen("text.z", mode="wt") as file: to generate such a file (looks like LZW but actually DEFLATE), and as such we wouldn't drop .z extension directly here, a grace period of one-year would be added.

coderabbitai · 2024-12-07T08:07:42Z

Walkthrough

The pull request introduces significant enhancements to file handling and encoding practices across multiple modules in the codebase. Key changes include the introduction of an EncodingWarning class, updates to the zopen function to enforce stricter parameter requirements, and the addition of UTF-8 encoding specifications in various file operations. The modifications aim to improve compatibility with different character sets and enhance error handling, particularly in serialization and deserialization processes. Additionally, several test cases have been updated to reflect these changes.

Changes

File	Change Summary
`src/monty/io.py`	Updated `zopen` function to require explicit `mode` parameter and added `EncodingWarning` class. Revised docstring and enhanced error handling for file types.
`src/monty/json.py`	Added `encoding="utf-8"` to file operations in `_load_redirect`, `save`, and `_d_from_path`. Modified `_check_type` for better type handling.
`src/monty/re.py`	Updated `regrep` function to include `encoding="utf-8"` when calling `zopen` with `reverse` flag set to `False`.
`src/monty/serialization.py`	Added `TextIO` import and specified `encoding="utf-8"` in `loadfn` and `dumpfn` functions.
`tasks.py`	Updated file operations to include `encoding="utf-8"` for markdown and `pyproject.toml` files in various tasks.
`tests/test_io.py`	Restructured tests in `TestZopen`, added parameterized tests for file handling, and included `encoding="utf-8"` in multiple test methods.
`tests/test_os.py`	Enhanced specificity in `test_makedirs_p` to check for error message "exists" when directory creation fails.
`tests/test_serialization.py`	Added `encoding="utf-8"` to file operations in `test_mpk` method.
`tests/test_shutil.py`	Updated multiple test methods to specify `encoding="utf-8"` for file operations.
`tests/test_tempfile.py`	Modified several test methods to include `encoding="utf-8"` for file opening operations.
`tests/test_files/myfile`	Deleted file containing a simple text line "HelloWorld."
`tests/test_files/myfile_txt`	Completely removed file with the text "HelloWorld."
`tests/test_files/real_lzw_file.txt.Z`	New binary file added, likely containing compressed data using the LZW algorithm.

Assessment against linked issues

Objective	Addressed	Explanation
Correctly display the symbol of the \Gamma point (#4214)	❌	No changes related to band structure display.

Possibly related PRs

Fix line ending in reverse_readfile/readline in Windows #700: Modifications to the src/monty/io.py file, particularly updates to the reverse_readfile and reverse_readline functions, are relevant as they also involve file handling and may interact with the changes made to the zopen function.
Lazily import torch/pydantic in json module, speedup from monty.json import by 10x #713: Changes in src/monty/json.py regarding handling of optional imports and type hinting improvements are related to the main PR's focus on enhancing the zopen function's functionality and error handling.
Declare required and optional dependency #714: Updates to dependency management in pyproject.toml may indirectly relate to the main PR by ensuring that required libraries for file handling (like lzma) are properly declared, supporting the functionality of the zopen function.

🐰 In the land of code where rabbits play,
UTF-8's magic brightens the day.
With zopen's rules now clear and bright,
Encoding warnings take flight!
Let's hop through files, so neat and fine,
In our coding burrow, all will align! 🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2024-12-07T08:08:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.71%. Comparing base (a3d35a6) to head (eb0a4db).
Report is 28 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #730      +/-   ##
==========================================
+ Coverage   82.57%   82.71%   +0.14%     
==========================================
  Files          27       27              
  Lines        1584     1597      +13     
  Branches      284      288       +4     
==========================================
+ Hits         1308     1321      +13     
  Misses        215      215              
  Partials       61       61

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/monty/io.py

…rator

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)

src/monty/io.py (1)

28-33: Consider making mode a required argument to enforce explicit usage

Currently, mode has a default value of None, and a warning is issued when it's not specified. To encourage explicit specification and prevent future breaking changes, consider removing the default value and making mode a required argument. This enforces best practices now and aligns with the planned deprecation of default mode.
tasks.py (1)
Line range hint 141-146: Consider error handling for pyproject.toml operations

While the UTF-8 encoding is correct, consider adding error handling for file operations.
-    with open("pyproject.toml", encoding="utf-8") as f:
-        contents = f.read()
-        contents = re.sub(r"version = ([\.\d\"]+)", f'version = "{version}"', contents)
-
-    with open("pyproject.toml", "w", encoding="utf-8") as f:
-        f.write(contents)
+    try:
+        with open("pyproject.toml", encoding="utf-8") as f:
+            contents = f.read()
+            contents = re.sub(r"version = ([\.\d\"]+)", f'version = "{version}"', contents)
+
+        with open("pyproject.toml", "w", encoding="utf-8") as f:
+            f.write(contents)
+    except IOError as e:
+        raise RuntimeError(f"Failed to update version in pyproject.toml: {e}") from e
tests/test_tempfile.py (1)
22-22: Add tests for non-ASCII content

While the UTF-8 encoding addition is good, the tests only write ASCII content ("write"). Consider adding tests with non-ASCII content to verify encoding handling.
+    def test_with_non_ascii_content(self):
+        test_content = "Hello, 世界! 🌍"
+        with ScratchDir(self.scratch_root) as d:
+            with open("unicode_test.txt", "w", encoding="utf-8") as f:
+                f.write(test_content)
+            with open("unicode_test.txt", "r", encoding="utf-8") as f:
+                content = f.read()
+            assert content == test_content
Also applies to: 30-30, 52-52, 63-63, 77-77, 86-86, 112-112, 131-131

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a3d35a6 and b4f1a6b.

⛔ Files ignored due to path filters (7)

tests/test_files/myfile.gz is excluded by !**/*.gz
tests/test_files/myfile_bz2.bz2.gz is excluded by !**/*.gz
tests/test_files/myfile_gz.gz is excluded by !**/*.gz
tests/test_files/myfile_lzma.lzma.gz is excluded by !**/*.gz
tests/test_files/myfile_txt.gz is excluded by !**/*.gz
tests/test_files/myfile_xz.xz is excluded by !**/*.xz
tests/test_files/myfile_xz.xz.gz is excluded by !**/*.gz

📒 Files selected for processing (12)

src/monty/io.py (1 hunks)
src/monty/json.py (3 hunks)
src/monty/re.py (1 hunks)
src/monty/serialization.py (4 hunks)
tasks.py (3 hunks)
tests/test_files/myfile (0 hunks)
tests/test_files/myfile_txt (0 hunks)
tests/test_io.py (8 hunks)
tests/test_os.py (1 hunks)
tests/test_serialization.py (1 hunks)
tests/test_shutil.py (9 hunks)
tests/test_tempfile.py (7 hunks)

💤 Files with no reviewable changes (2)

tests/test_files/myfile
tests/test_files/myfile_txt

🔇 Additional comments (14)

src/monty/re.py (1)

52-52: Explicitly specifying UTF-8 encoding ensures consistent text handling

Adding encoding="utf-8" when opening the file in text mode with zopen promotes consistent behavior across different environments and prevents potential decoding errors with non-ASCII characters.

tests/test_serialization.py (1)

85-85: Adding explicit UTF-8 encoding enhances test robustness

Specifying encoding="utf-8" when opening the JSON file ensures the test behaves consistently, especially in environments with different default encodings, and properly handles any Unicode characters.

tasks.py (2)

70-72: LGTM: Proper encoding for index.md modifications

The UTF-8 encoding ensures proper handling of the table of contents and layout configurations.

42-42: LGTM: Consistent UTF-8 encoding for documentation files

The addition of UTF-8 encoding for documentation files is appropriate, especially for handling special characters in API documentation.

Let's verify if there are any special characters in the documentation that would benefit from this change:

Also applies to: 62-62

✅ Verification successful

Adding UTF-8 encoding is necessary and beneficial

The verification confirms that many documentation files contain non-ASCII characters (smart quotes in license text and special characters in documentation). Using UTF-8 encoding is the correct approach to handle these characters properly.

Found non-ASCII characters in multiple documentation files including docs/monty.termcolor.md
The change ensures consistent and correct handling of special characters across all documentation files

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for non-ASCII characters in documentation files
rg -l "[^\x00-\x7F]" "docs/*.md"

Length of output: 84

Script:

#!/bin/bash
# First, let's find the correct documentation files location
fd -t f ".md$"

# Then check for any non-ASCII characters in these files
fd -t f ".md$" --exec rg -l "[^\x00-\x7F]" {}

Length of output: 1315

Script:

#!/bin/bash
# Let's check what kind of non-ASCII characters are present in one of these files
# to confirm they're actually documentation content
rg --no-line-number "[^\x00-\x7F]" "./docs/monty.termcolor.md" -C 2

Length of output: 713

src/monty/serialization.py (2)

70-70: LGTM: Consistent encoding for file loading

The addition of UTF-8 encoding for text file loading is appropriate and aligns with the project's encoding standardization goals.

123-125: Verify type casting necessity

The type casting of fp to TextIO might be redundant as zopen with "wt" mode should already return a text file object.

Let's check the zopen implementation:

✅ Verification successful

The zopen implementation shows that it returns a union type IO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile and in its docstring it specifies the return type as TextIO | BinaryIO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile. Since the function can return different types based on the file extension, and the type system might not be able to narrow down the specific return type when used with "wt" mode, the type casting to TextIO is actually necessary for type safety.

Type casting to TextIO is necessary

The explicit cast helps the type checker understand that we're dealing with a text file handle, as zopen can return multiple different file types and the type system might not narrow it down automatically based on the "wt" mode.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check zopen's return type
ast-grep --pattern 'def zopen($$$) -> $_: $$$'

Length of output: 4910

tests/test_shutil.py (3)

29-31: LGTM: Explicit UTF-8 encoding added consistently.

The changes properly specify UTF-8 encoding for all file operations in the TestCopyR class, aligning with the PR's objective to enforce explicit encoding.

Also applies to: 34-36, 56-56

73-73: LGTM: UTF-8 encoding consistently applied to compression tests.

The changes properly specify UTF-8 encoding across all file operations in the TestCompressFileDir class, maintaining consistency with the PR's encoding requirements.

Also applies to: 88-88, 120-120

130-132: LGTM: Comprehensive UTF-8 encoding coverage in gzip tests.

The changes properly specify UTF-8 encoding for all file operations in the TestGzipDir class, including both read and write operations, ensuring consistent encoding behavior.

Also applies to: 155-155, 166-166, 176-176

tests/test_io.py (4)

32-32: LGTM: UTF-8 encoding added to reverse readline tests.

The changes properly specify UTF-8 encoding for all file operations in the TestReverseReadline class, ensuring consistent encoding behavior.

Also applies to: 42-42, 64-64

78-80: LGTM: Proper handling of line endings with UTF-8 encoding.

The changes correctly implement UTF-8 encoding while maintaining proper line ending handling for both Unix/Mac and Windows environments.

Also applies to: 83-83, 93-98, 101-101

181-204: LGTM: Comprehensive test coverage for file operations.

The new parameterized test method provides thorough coverage of:

Multiple file extensions (.txt, .bz2, .gz, .xz, .lzma)
Both text and binary modes
Proper UTF-8 encoding specification
Read and write operations

227-263: LGTM: Comprehensive warning behavior testing.

The new test method thoroughly verifies the warning system for:

Default UTF-8 encoding behavior
Implicit text/binary mode usage
Default mode usage
Proper handling across different file extensions

src/monty/json.py (1)

449-449: LGTM: UTF-8 encoding added to JSON file operations.

The changes properly specify UTF-8 encoding for JSON file read/write operations, ensuring consistent encoding behavior for serialization and deserialization.

Also applies to: 502-502

src/monty/io.py

tests/test_os.py

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

src/monty/io.py (3)
25-25: Consider adding version-specific implementation.

The current implementation backports EncodingWarning from Python 3.10, but it might be better to use the built-in class when available.

Consider this implementation:
-class EncodingWarning(Warning): ...  # Added in Python 3.10
+import sys
+if sys.version_info >= (3, 10):
+    from warnings import EncodingWarning
+else:
+    class EncodingWarning(Warning): ...  # Backport from Python 3.10
28-33: Consider using more specific return type hint.

The current return type hint uses a union of generic IO with specific file types. For better type safety, consider using Union[BinaryIO, TextIO] instead of generic IO.
-) -> IO | bz2.BZ2File | gzip.GzipFile | lzma.LZMAFile:
+) -> Union[BinaryIO, TextIO, bz2.BZ2File, gzip.GzipFile, lzma.LZMAFile]:
58-90: Consider centralizing deadline management.

The deadline is currently hardcoded in the function. Consider moving it to a module-level constant for easier maintenance and consistency.
+# At module level
+IMPLICIT_MODE_DEADLINE = "2025-06-01"

-    _deadline: str = "2025-06-01"
+    _deadline: str = IMPLICIT_MODE_DEADLINE

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b4f1a6b and 53c63f6.

📒 Files selected for processing (1)

src/monty/io.py (1 hunks)

🔇 Additional comments (2)

src/monty/io.py (2)

99-108: ⚠️ Potential issue

Implement proper handling for .z files.

As discussed in previous reviews, the current implementation doesn't properly handle .z files, which could lead to incorrect behavior since Python's gzip module cannot handle LZW compression.

Consider raising an exception instead:

 if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
+    raise NotImplementedError(
+        "Support for '.z' files is not implemented. "
+        "Python's gzip module cannot decompress LZW-compressed '.z' files. "
+        "Please use an alternative method to handle '.z' files or rename to .gz "
+        "if the file is actually gzip compressed."
+    )

83-90: Verify impact of UTF-8 default encoding.

The change to use UTF-8 as default encoding aligns with PEP 686, but we should verify this doesn't break existing code that might rely on system default encoding.

✅ Verification successful

Let me try a different approach to gather the necessary information.

Based on the search results, I can now generate the final response:

UTF-8 default encoding change is safe to proceed

The codebase already consistently uses UTF-8 encoding across all file operations:

All explicit file operations in tests use encoding="utf-8"
Text mode operations through zopen() in src/monty/re.py and src/monty/serialization.py already specify UTF-8
No instances found of file operations relying on system default encoding

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for potential encoding-sensitive file operations
rg -l "encoding.*=(?!.*utf)" . | while read -r file; do
  echo "=== $file ==="
  rg "encoding.*=(?!.*utf)" "$file" -A 2
done

# Search for potential binary file operations that might be affected
ast-grep --pattern 'zopen($_, "r")'

Length of output: 384

Script:

#!/bin/bash
# Search for file operations with explicit encoding
rg -A 2 'encoding\s*=' 

# Search for open operations in text mode
ast-grep --pattern 'open($_, "r")'
ast-grep --pattern 'open($_, "rt")'

# Search for potential file read operations
rg -A 2 '\.read\(\)'

Length of output: 11034

DanielYang59 · 2024-12-10T10:38:54Z

src/monty/io.py

+            category=EncodingWarning,
+            stacklevel=2,
+        )
+        kwargs["encoding"] = "utf-8"

    _name, ext = os.path.splitext(filename)


I guess splittext got updated at some point? It works directly with str as path now

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

src/monty/io.py (3)

25-25: Consider enhancing the EncodingWarning documentation

While the implementation is correct, adding a docstring would help clarify its purpose and usage, especially since it appears to be a backport of Python 3.10's EncodingWarning.

-class EncodingWarning(Warning): ...  # Added in Python 3.10
+class EncodingWarning(Warning):
+    """Warning category for encoding-related warnings.
+
+    This is a backport of Python 3.10's EncodingWarning to maintain
+    compatibility with earlier Python versions.
+    """
+    ...

58-58: Consider using a datetime object for the deadline

The deadline is currently a hardcoded string. Consider using a datetime object for better validation and comparison capabilities.

-_deadline: str = "2025-06-01"
+from datetime import datetime
+_deadline: datetime = datetime(2025, 6, 1)

95-110: Consider using a dictionary for extension mapping

The current implementation uses multiple if statements for extension mapping. Consider using a dictionary for better maintainability and performance.

+_COMPRESSION_HANDLERS = {
+    ".bz2": bz2.open,
+    ".gz": gzip.open,
+    ".xz": lzma.open,
+    ".lzma": lzma.open,
+}

-if ext == ".bz2":
-    return bz2.open(filename, mode, **kwargs)
-if ext == ".gz":
-    return gzip.open(filename, mode, **kwargs)
-if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
-if ext in {".xz", ".lzma"}:
-    return lzma.open(filename, mode, **kwargs)
+handler = _COMPRESSION_HANDLERS.get(ext)
+if handler:
+    return handler(filename, mode, **kwargs)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 53c63f6 and eb0a4db.

📒 Files selected for processing (3)

src/monty/io.py (1 hunks)
tests/test_files/real_lzw_file.txt.Z (1 hunks)
tests/test_io.py (8 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/test_files/real_lzw_file.txt.Z

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_io.py

🔇 Additional comments (4)

src/monty/io.py (4)

17-22: LGTM: Well-organized imports with proper type hints

The new imports are properly organized and the type hints are correctly conditioned with TYPE_CHECKING for runtime optimization.

28-56: LGTM: Well-designed function signature with clear documentation

The updated signature with positional-only parameter and comprehensive type hints improves clarity. The docstring effectively communicates the upcoming breaking changes and usage requirements.

82-90: LGTM: Proper implementation of UTF-8 default encoding

The implementation correctly follows PEP 686 by defaulting to UTF-8 encoding and warning users about implicit encoding usage.

99-108: ⚠️ Potential issue

Consider immediate deprecation of .z extension

Since Python's gzip module cannot properly handle LZW-compressed .z files, continuing to support them might lead to silent corruption or incorrect behavior. Consider raising a NotImplementedError instead of just warning.

 if ext == ".z":
-    # TODO: drop ".z" extension support after 2026-01-01
-    warnings.warn(
-        "Python gzip is not able to (de)compress LZW-compressed files. "
-        "You should rename it to .gz. Support for the '.z' extension will "
-        "be removed after 2026-01-01.",
-        category=FutureWarning,
-        stacklevel=2,
-    )
-    return gzip.open(filename, mode, **kwargs)
+    raise NotImplementedError(
+        "Support for '.z' files is not implemented as Python's gzip module "
+        "cannot handle LZW-compressed files. Please use '.gz' extension instead."
+    )

docstring tweak

712e70c

DanielYang59 commented Dec 7, 2024

View reviewed changes

src/monty/io.py Show resolved Hide resolved

add placeholder

01c6813

DanielYang59 changed the title ~~zopen explicitly use UTF-8 encoding in text mode~~ zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode Dec 7, 2024

DanielYang59 mentioned this pull request Dec 7, 2024

zopen: explicit binary/text mode , and explicit encoding as UTF-8 materialsproject/pymatgen#4219

Merged

1 task

DanielYang59 changed the title ~~zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode~~ zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode Dec 7, 2024

DanielYang59 changed the title ~~zopen explicitly use UTF-8 encoding in text mode, and disallow implicit binary/text mode~~ zopen changes: forbit implicit binary/text mode, signature change, default UTF-8 encoding in text mode Dec 8, 2024

DanielYang59 changed the title ~~zopen changes: forbit implicit binary/text mode, signature change, default UTF-8 encoding in text mode~~ zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode Dec 8, 2024

change zopen signature

3db323a

DanielYang59 commented Dec 8, 2024

View reviewed changes

src/monty/io.py Show resolved Hide resolved

warn against default encoding and implicity binary/text

3dc4210

DanielYang59 commented Dec 8, 2024

View reviewed changes

src/monty/io.py Outdated Show resolved Hide resolved

DanielYang59 added 3 commits December 8, 2024 14:47

deprecation .z extension support

5317c77

custom encoding warning

76dcd63

enhance unit test

625972a

DanielYang59 force-pushed the zopen-encoding branch from 77171ff to 3d17a62 Compare December 8, 2024 09:29

remove unnecessary static test file

671e8d4

DanielYang59 force-pushed the zopen-encoding branch from 3d17a62 to 671e8d4 Compare December 8, 2024 09:30

DanielYang59 changed the title ~~zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode~~ zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year Dec 8, 2024

explicit encoding

40ac109

DanielYang59 force-pushed the zopen-encoding branch from b2e5e67 to 40ac109 Compare December 8, 2024 09:42

mypy fix

39d622b

DanielYang59 force-pushed the zopen-encoding branch from 893c914 to 39d622b Compare December 8, 2024 09:54

This comment was marked as outdated.

Sign in to view

DanielYang59 added 2 commits December 8, 2024 22:04

I decide to trust users and not assume text by default

6694cb2

fix test in binary mode windows, and OS would overwrite the line sepa…

4bbab69

…rator

DanielYang59 force-pushed the zopen-encoding branch from 23c1c02 to 4bbab69 Compare December 8, 2024 14:10

minimize behaviour change, default mode as r instead of rt

ee51ca6

enable PEP 597 optional EncodingWarning

b4f1a6b

DanielYang59 marked this pull request as ready for review December 8, 2024 14:49

coderabbitai bot reviewed Dec 8, 2024

View reviewed changes

src/monty/io.py Outdated Show resolved Hide resolved

src/monty/io.py Outdated Show resolved Hide resolved

tests/test_os.py Show resolved Hide resolved

DanielYang59 marked this pull request as draft December 8, 2024 14:58

DanielYang59 added 2 commits December 8, 2024 23:00

fix fake LZW file handling

841c702

maybe not use PYTHONWARNDEFAULTENCODING

53c63f6

DanielYang59 marked this pull request as ready for review December 8, 2024 15:35

coderabbitai bot reviewed Dec 8, 2024

View reviewed changes

DanielYang59 marked this pull request as draft December 9, 2024 08:14

DanielYang59 added 2 commits December 9, 2024 22:34

add real LZW test

0c07987

looks like splittext works for Path directly

eb0a4db

DanielYang59 marked this pull request as ready for review December 10, 2024 10:37

DanielYang59 commented Dec 10, 2024

View reviewed changes

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

shyuep merged commit 6409f7c into materialsvirtuallab:master Dec 10, 2024
10 checks passed

DanielYang59 deleted the zopen-encoding branch December 11, 2024 01:13

coderabbitai bot mentioned this pull request Dec 11, 2024

Fix line ending handling in reverse_readfile/readline across OS, and not skipping empty lines #712

Merged

3 tasks

esoteric-ephemera mentioned this pull request Dec 20, 2024

What's the correct way to use the MAGMOM's value when using MP API data? materialsproject/pymatgen#4173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`zopen` changes: forbid implicit binary/text `mode`, signature change, default UTF-8 `encoding` in text mode, drop `.z` support after one-year #730

`zopen` changes: forbid implicit binary/text `mode`, signature change, default UTF-8 `encoding` in text mode, drop `.z` support after one-year #730

DanielYang59 commented Dec 7, 2024 •

edited

Loading

coderabbitai bot commented Dec 7, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov bot commented Dec 7, 2024 •

edited

Loading

This comment was marked as outdated.

coderabbitai bot left a comment

coderabbitai bot left a comment

DanielYang59 Dec 10, 2024

coderabbitai bot left a comment

zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year #730

zopen changes: forbid implicit binary/text mode, signature change, default UTF-8 encoding in text mode, drop .z support after one-year #730

Conversation

DanielYang59 commented Dec 7, 2024 • edited Loading

Summary

Signature change for zopen

Forbid implicit binary/text mode for zopen

Default "UTF-8" encoding in text mode

Drop support for .z extension after one-year grace period

coderabbitai bot commented Dec 7, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov bot commented Dec 7, 2024 • edited Loading

Codecov Report

This comment was marked as outdated.

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

DanielYang59 Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

`zopen` changes: forbid implicit binary/text `mode`, signature change, default UTF-8 `encoding` in text mode, drop `.z` support after one-year #730

`zopen` changes: forbid implicit binary/text `mode`, signature change, default UTF-8 `encoding` in text mode, drop `.z` support after one-year #730

DanielYang59 commented Dec 7, 2024 •

edited

Loading

Signature change for `zopen`

Forbid implicit binary/text mode for `zopen`

Default "UTF-8" `encoding` in text mode

Drop support for `.z` extension after one-year grace period

coderabbitai bot commented Dec 7, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Dec 7, 2024 •

edited

Loading