untokenize of specially crafted escaped characters does not round trip properly #125821

asottile · 2024-10-21T23:35:32Z

Bug report

Bug description:

this small program does not roundtrip through tokenize / untokenize -- it appears to mishandle the escaped quote as a \N{NAMED ESCAPE}

bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

this is what it produces after a round of untokenization:

$ python3 t.py t3.py 
bar = 1
print(f"{bar} \"{ SNOWMAN}} {{foo}}")

annoyingly, tokenize_rt suffers from a different related bug which is why I was investigating this to begin with. an aside, the handling of curly braces in 3.12+ tokenization is a huge pain!

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

asottile · 2024-10-21T23:35:54Z

cc @pablogsal ecf16ee

pablogsal · 2024-10-22T08:24:39Z

Sorry is a bit late here and I had a long day, but what I am missing here:

>>> import tokenize
>>> import io
>>> code = r'f"{bar} \"{{SNOWMAN}} {{foo}}"'
>>> tokens = list(tokenize.generate_tokens(io.StringIO(code).readline))
>>> tokenize.untokenize(tokens)
'f"{bar} \\"{{SNOWMAN}} {{foo}}"'
>>> code
'f"{bar} \\"{{SNOWMAN}} {{foo}}"'

Same with a file:

$ cat lel.py
bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

$ ./python.exe
Python 3.14.0a1+ (heads/main-dirty:03f9264ecef, Oct 22 2024, 09:15:42) [Clang 16.0.0 (clang-1600.0.26.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> data = open('lel.py')
>>> tokens = list(tokenize.generate_tokens(data.readline))
>>> print(tokenize.untokenize(tokens))
bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

Sorry if I am missing anything obvious. Can you provide a small reproducer I can run?

pablogsal · 2024-10-22T08:38:18Z

Ah this was fixed in db23b8b in main and was back ported to 3.12 but not yet released:

#125021

pablogsal · 2024-10-22T08:40:28Z

Closing as duplicate of #125008

asottile · 2024-10-22T10:47:42Z

hah yep -- just checked my vod and I spent all the time waiting on a build only to rerun my system python 🤦

asottile added the type-bug An unexpected behavior, bug, or error label Oct 21, 2024

pablogsal marked this as a duplicate of #125008 Oct 22, 2024

pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

untokenize of specially crafted escaped characters does not round trip properly #125821

untokenize of specially crafted escaped characters does not round trip properly #125821

asottile commented Oct 21, 2024 •

edited by github-actions bot

Loading

asottile commented Oct 21, 2024

pablogsal commented Oct 22, 2024 •

edited

Loading

pablogsal commented Oct 22, 2024 •

edited

Loading

pablogsal commented Oct 22, 2024 •

edited

Loading

asottile commented Oct 22, 2024

untokenize of specially crafted escaped characters does not round trip properly #125821

untokenize of specially crafted escaped characters does not round trip properly #125821

Comments

asottile commented Oct 21, 2024 • edited by github-actions bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

asottile commented Oct 21, 2024

pablogsal commented Oct 22, 2024 • edited Loading

pablogsal commented Oct 22, 2024 • edited Loading

pablogsal commented Oct 22, 2024 • edited Loading

asottile commented Oct 22, 2024

asottile commented Oct 21, 2024 •

edited by github-actions bot

Loading

pablogsal commented Oct 22, 2024 •

edited

Loading

pablogsal commented Oct 22, 2024 •

edited

Loading

pablogsal commented Oct 22, 2024 •

edited

Loading