Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

untokenize of specially crafted escaped characters does not round trip properly #125821

Closed
asottile opened this issue Oct 21, 2024 · 5 comments
Closed
Labels
type-bug An unexpected behavior, bug, or error

Comments

@asottile
Copy link
Contributor

asottile commented Oct 21, 2024

Bug report

Bug description:

this small program does not roundtrip through tokenize / untokenize -- it appears to mishandle the escaped quote as a \N{NAMED ESCAPE}

bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

this is what it produces after a round of untokenization:

$ python3 t.py t3.py 
bar = 1
print(f"{bar} \"{ SNOWMAN}} {{foo}}")

annoyingly, tokenize_rt suffers from a different related bug which is why I was investigating this to begin with. an aside, the handling of curly braces in 3.12+ tokenization is a huge pain!

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

@asottile asottile added the type-bug An unexpected behavior, bug, or error label Oct 21, 2024
@asottile
Copy link
Contributor Author

cc @pablogsal ecf16ee

@pablogsal
Copy link
Member

pablogsal commented Oct 22, 2024

Sorry is a bit late here and I had a long day, but what I am missing here:

>>> import tokenize
>>> import io
>>> code = r'f"{bar} \"{{SNOWMAN}} {{foo}}"'
>>> tokens = list(tokenize.generate_tokens(io.StringIO(code).readline))
>>> tokenize.untokenize(tokens)
'f"{bar} \\"{{SNOWMAN}} {{foo}}"'
>>> code
'f"{bar} \\"{{SNOWMAN}} {{foo}}"'

Same with a file:

$ cat lel.py
bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

$ ./python.exe
Python 3.14.0a1+ (heads/main-dirty:03f9264ecef, Oct 22 2024, 09:15:42) [Clang 16.0.0 (clang-1600.0.26.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> data = open('lel.py')
>>> tokens = list(tokenize.generate_tokens(data.readline))
>>> print(tokenize.untokenize(tokens))
bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

Sorry if I am missing anything obvious. Can you provide a small reproducer I can run?

@pablogsal
Copy link
Member

pablogsal commented Oct 22, 2024

Ah this was fixed in db23b8b in main and was back ported to 3.12 but not yet released:

#125021

@pablogsal
Copy link
Member

pablogsal commented Oct 22, 2024

Closing as duplicate of #125008

@pablogsal pablogsal marked this as a duplicate of #125008 Oct 22, 2024
@pablogsal pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024
@asottile
Copy link
Contributor Author

hah yep -- just checked my vod and I spent all the time waiting on a build only to rerun my system python 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants