Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zopen: explicit binary/text mode , and explicit encoding as UTF-8 #4219

Merged
merged 14 commits into from
Dec 11, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Dec 7, 2024

Summary

Detailed reasoning and example are given in materialsvirtuallab/monty#730 (comment). In a word, not passing b/t in mode would cause different and unexpected behaviour when opening a text file (file.txt) versus a compressed file (file.gz), for the former we would get TextIO which gives string but BinaryIO for the latter which gives bytes.


There're few cases where this wouldn't cause problem, for example json.load accepts both binary and text IO:

json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

For such cases, I tested with the pdag3_complete_dos.json.gz file, "text mode" seems slightly faster:

Binary mode: Loaded data (first 100 chars): {'@module': 'pymatgen.electronic_structure.dos', '@class': 'CompleteDos', 'efermi': -0.28973031, 'st...
Time taken (binary mode): 0.3674 seconds
Text mode: Loaded data (first 100 chars): {'@module': 'pymatgen.electronic_structure.dos', '@class': 'CompleteDos', 'efermi': -0.28973031, 'st...
Time taken (text mode): 0.3450 seconds

@DanielYang59 DanielYang59 changed the title Explicit mode for zopen Explicit binary/text mode for zopen Dec 7, 2024
@@ -1761,5 +1761,5 @@ def write_file(
mode: Literal["w", "a", "wt", "at"] = "w",
Copy link
Contributor Author

@DanielYang59 DanielYang59 Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail if filename is a name for compressed file (e.g. file.gz), in this case file would be BinaryIO and we cannot write str into it:

def write_file(
self,
filename: str | Path,
mode: Literal["w", "a", "wt", "at"] = "w",
) -> None:
"""Write the CIF file."""
with zopen(filename, mode=mode) as file:
file.write(str(self))

w and a have been removed in 4fccd59, with default changed to wt

@DanielYang59 DanielYang59 changed the title Explicit binary/text mode for zopen zopen: explicit binary/text mode , and explicit encoding as UTF-8 Dec 8, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review December 8, 2024 15:35
@shyuep shyuep merged commit b2a5e0f into materialsproject:master Dec 11, 2024
42 of 43 checks passed
@DanielYang59 DanielYang59 deleted the zopen-explicit-mode branch December 11, 2024 02:30
@DanielYang59 DanielYang59 restored the zopen-explicit-mode branch December 11, 2024 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants