-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenize does not roundtrip {{ after \n #125008
Comments
Furthermore, here is the output of the following code: import tokenize, io
source_code = r'f"\n{{test}}"'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
for t in tokens:
print(t) TokenInfo(type=61 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"\\n{{test}}"')
TokenInfo(type=62 (FSTRING_MIDDLE), string='\\n{', start=(1, 2), end=(1, 5), line='f"\\n{{test}}"')
TokenInfo(type=62 (FSTRING_MIDDLE), string='test}', start=(1, 6), end=(1, 11), line='f"\\n{{test}}"')
TokenInfo(type=63 (FSTRING_END), string='"', start=(1, 12), end=(1, 13), line='f"\\n{{test}}"')
TokenInfo(type=4 (NEWLINE), string='', start=(1, 13), end=(1, 14), line='f"\\n{{test}}"')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') So, it seems that the line is getting in alright, but the \n{{ is getting turned into a \n{ in the tokenizer somehow. Same erroneous output for the bytes version (with rb-string, BytesIO and tokenize.tokenize). |
It looks like this was a regression in Python 3.12; I can't reproduce the behaviour with Python 3.11. I'm guessing it was caused by the PEP-701 changes. |
Reproduced on the |
This seems to happen with other escape characters as well: import tokenize, io
source_code = r'f"""\t{{test}}"""'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
x = tokenize.untokenize((t,s) for t, s, *_ in tokens)
print(x) # f"""\t{test}}""" import tokenize, io
source_code = r'f"""\r{{test}}"""'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
x = tokenize.untokenize((t,s) for t, s, *_ in tokens)
print(x) # f"""\r{test}}""" |
I think the issue is in this method: Lines 187 to 208 in 16cd6cc
This PR fixed the handling of Unicode literals (e.g. if character == "{":
n_backslashes = sum(
1 for char in _itertools.takewhile(
"\\".__eq__,
characters[-2::-1]
)
)
- if n_backslashes % 2 == 0:
+ if n_backslashes % 2 == 0 or characters[-1] != "N":
characters.append(character)
else:
consume_until_next_bracket = True |
…onGH-125013) (cherry picked from commit db23b8b) Co-authored-by: Tomas R. <[email protected]>
…onGH-125013) (cherry picked from commit db23b8b) Co-authored-by: Tomas R. <[email protected]>
Extremely late for me to say this, but I thought I'd add: I only unearthed this bug because I'm working on code that incidentally tries to tokenizer-roundtrip everything in https://github.com/hauntsaninja/mypy_primer. So, doing something like that (possibly literally just that) as a test case for tokenize could perhaps be a good idea to prevent future regressions — although setting that up sounds like a hassle! |
We actually already do that here for some random files in the test folder: cpython/Lib/test/test_tokenize.py Lines 1822 to 1835 in 26d6277
Though, so far it only compares the tokens, not the actual source code. I'd like to extend this check to compare the source code as well here: #126010 |
Bug report
Bug description:
Expected:
Got:
Note the absence of a second { in the {{ after the \n — but in no other positions.
Unlike some other roundtrip failures of tokenize, some of which are minor infelicities, this one actually creates a syntactically invalid program on roundtrip, which is quite bad. You get a
SyntaxError: f-string: single '}' is not allowed
when trying to use the results.CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows
Linked PRs
tokenize.untokenize
roundtrip for\n{{
#125013tokenize.untokenize
roundtrip for\n{{
(GH-125013) #125020tokenize.untokenize
roundtrip for\n{{
(GH-125013) #125021The text was updated successfully, but these errors were encountered: