Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines incorrectly handled in reverse_readfile on windows #471

Closed
pmrv opened this issue Feb 2, 2023 · 2 comments · Fixed by #700
Closed

Newlines incorrectly handled in reverse_readfile on windows #471

pmrv opened this issue Feb 2, 2023 · 2 comments · Fixed by #700

Comments

@pmrv
Copy link

pmrv commented Feb 2, 2023

System

  • Monty version: 2022.9.9
  • Python version: 3.11
  • OS version: windows

Summary

In reverse_readfile the line separator is hard coded as \n, but since monty opens the file in binary mode python doesn't do the usual newline translation you end up with spurious \r at the end of lines read by reverse_readfile. I would think reverse_readlines suffers from the same problem. I've came across this only on windows, but a similar issue should happen in macOS, where monty doesn't detect any lines in files, since the line separator is just \r there.

Example code

I don't have a working installation of python+monty on windows, but there's an example output in our CI here.

Suggested solution (if known)

Just guessing, but a simple solution might just be to open the files in text mode or pass the newline argument to the underlying python functions, since you .decode('utf8') all strings anyway. I'm not sure if this would interfere with your handling of compressed files. If it does you'd have to replace every occurrence of \n in the code with os.linesep.

@DanielYang59
Copy link
Contributor

DanielYang59 commented Jul 23, 2024

I'm able to recreate this issue, would fix it today.

from monty.io import reverse_readfile


with open("sample_windows.txt", "w", newline="\r\n") as f:
    f.write("\r\n".join(["Line1", "Line2", "Line3"]))

with open("sample_unix_mac.txt", "w", newline="\n") as f:
    f.write("\n".join(["Line1", "Line2", "Line3"]))

for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
    print(f"Reading file: {filename}")
    for line in reverse_readfile(filename):
        print(repr(line))

Generates:

Reading file: sample_windows.txt
'Line3'
'Line2\r\r'
'Line1\r\r'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'

@DanielYang59
Copy link
Contributor

The issue indeed exists for reverse_readline, with:

from monty.io import reverse_readline


with open("sample_windows.txt", "w", newline="\r\n") as f:
    f.write("\r\n".join(["Line1", "Line2", "Line3"]))

with open("sample_unix_mac.txt", "w", newline="\n") as f:
    f.write("\n".join(["Line1", "Line2", "Line3"]))

for filename in ("sample_windows.txt", "sample_unix_mac.txt"):
    print(f"Reading file: {filename}")
    with open(filename) as file:
        for line in reverse_readline(file):
            print(repr(line))

We now have:

Reading file: sample_windows.txt
'Line3'
''
'Line2'
''
'Line1'
Reading file: sample_unix_mac.txt
'Line3'
'Line2'
'Line1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants