Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generator doesn't preserve individual string quote style #7799

Closed
dhruvmanila opened this issue Oct 4, 2023 · 1 comment · Fixed by #15818
Closed

Code generator doesn't preserve individual string quote style #7799

dhruvmanila opened this issue Oct 4, 2023 · 1 comment · Fixed by #15818
Assignees
Labels
bug Something isn't working fixes Related to suggested fixes for violations

Comments

@dhruvmanila
Copy link
Member

This is related to the Generator struct.

This is especially true for triple-quoted strings. The Generator will normalize them to single quotes and perform some transformations which may or may not be correct:

  • Escape the inner quotes if they match the outer ones
  • Preserve the whitespaces (for example, using \n)

Related issues:

@sanxiyn
Copy link
Contributor

sanxiyn commented Feb 15, 2024

I am interested in working on this.

AlexWaygood added a commit that referenced this issue Mar 8, 2024
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for #7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
nkxxll pushed a commit to nkxxll/ruff that referenced this issue Mar 10, 2024
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to astral-sh#10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for astral-sh#7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in astral-sh#10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
@AlexWaygood AlexWaygood self-assigned this May 6, 2024
@ntBre ntBre assigned ntBre and unassigned AlexWaygood Jan 24, 2025
ntBre added a commit that referenced this issue Jan 27, 2025
## Summary

This is a first step toward fixing #7799 by using the quoting style
stored in the `flags` field on `ast::StringLiteral`s to select a quoting
style. This PR does not include support for f-strings or byte strings.

Several rules also needed small updates to pass along existing quoting
styles instead of using `StringLiteralFlags::default()`. The remaining
snapshot changes are intentional and should preserve the quotes from the
input strings.

## Test Plan

Existing tests with some accepted updates, plus a few new RUF055 tests
for raw strings.

---------

Co-authored-by: Alex Waygood <[email protected]>
ntBre added a commit that referenced this issue Feb 4, 2025
## Summary

This is a follow-up to #15726, #15778, and #15794 to preserve the triple
quote and prefix flags in plain strings, bytestrings, and f-strings.

I also added a `StringLiteralFlags::without_triple_quotes` method to
avoid passing along triple quotes in rules like SIM905 where it might
not make sense, as discussed
[here](#15726 (comment)).

## Test Plan

Existing tests, plus many new cases in the `generator::tests::quote`
test that should cover all combinations of quotes and prefixes, at least
for simple string bodies.

Closes #7799 when combined with #15694, #15726, #15778, and #15794.

---------

Co-authored-by: Alex Waygood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants