Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax highlighting for format strings #4006

Merged
merged 4 commits into from
Apr 24, 2020

Conversation

ltentrup
Copy link
Contributor

I have an implementation for syntax highlighting for format string modifiers {}.
The first commit refactors the changes in #3826 into a separate struct.
The second commit implements the highlighting: first we check in a macro call whether the macro is a format macro from std. In this case, we remember the format string node. If we encounter this node during syntax highlighting, we check for the format modifiers {} using regular expressions.

There are a few places which I am not quite sure:

  • Is the way I extract the macro names correct?
  • Is the HighlightTag::Attribute suitable for highlighting the {}?

Let me know what you think, any feedback is welcome!

@lnicola
Copy link
Member

lnicola commented Apr 17, 2020

I guess it doesn't fix it, but CC #3419.

@ltentrup
Copy link
Contributor Author

Thanks, I forgot to include a link! It is related, but for actually checking the syntax it is probably needed to create a separate parser, which this PR does not do.

crates/ra_ide/src/syntax_highlighting.rs Outdated Show resolved Hide resolved
static ref RE: Regex =
Regex::new(r"[^\{](?P<format>\{(?:\}|[^\{].*?\}))[^\}]").unwrap();
}
for (start, end) in RE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather then using ad-hoc regex parsing, I suggest that we install some future proofing here. Eventually, we'd want to handle and color escape sequences properly, as well as check the semantics of format specifiers inside {}.

So I think we need a new function, lex_string_literal, which returns inner "tokens" comprising the string. I think it should live in ra_syntax/ast/tokens.rs as a method on ast::String. Ideally, it should have the following return type

enum StrngPiece {
    Quote,
    LiteralSequence,
    EscapeSequence,
    FormatSpecifier,
}

fn lex(&self) -> impl Iterator<Item = (TextRange, StringPIece)> { ... }

We, however, already use internal iteration for escape sequences (see the call to unescape_str), so, instead of returning an iterator, we might just accept the function with (TextRange, StringPIece) argument. I don't think we necessary need to handle escape sequences in this PR (although we might do that as well, they also should be colored), but I think it's important to properly isolate string lexer into a separate function, becase we'll have to extend it eventually to handle everything. Oh, and I think it's better to avoid regex here and code everything in a manual way, as we'd need that for escape sequences anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to your detailed description it was straightforward to implement 👍
I have updated the PR to include the manual lexer (for format specifier and escape sequences). I have also included some tests for the happy path, in case of an error I opted to just return the neutral StringPiece::LiteralSequence.
The current implementation does not support RawStrings, though.

// unicode escape
chars.next();
let mut cloned = chars.clone().take(8); // up to 6 digits + opening `{` and closing `}`
if let Some(next_char) = cloned.next() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear about this, we already have logic for this implemented in rustc_lexer:

https://github.com/rust-lang/rust/blob/52fa23add6fb0776b32cc591ac928618391bdf41/src/librustc_lexer/src/unescape.rs#L207-L260

So, rather than re-doing this, we should use the unscape_xxx family of funcitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unescape function cannot be used in a compositional way together with the format specifier so I decided to focus in this PR on the format specifier. In a follow-up it should be easy to include escape sequences by using the unescape functions and a second iteration over the string as escape sequences and format specifier should have mutual exclusive ranges (and the sorting of syntax ranges was implemented in #4022).

Detailed changes:
1) Implement a lexer for string literals that divides the string in format specifier `{}` including the format specifier modifier.
2) Adapt syntax highlighting to add ranges for the detected sequences.
3) Add a test case for the format string syntax highlighting.
crates/ra_ide/src/syntax_highlighting.rs Outdated Show resolved Hide resolved
crates/ra_syntax/src/ast/tokens.rs Outdated Show resolved Hide resolved
// integer
read_integer(&mut chars, initial_len, callback);
}
'a'..='z' | 'A'..='Z' | '_' => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do format specifiers allow unicode identifiers on nightly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they are actually allowed even on stable (which can be considered a bug):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d9ec6e983b041ee8937c6b6900734480

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for those who can't read Russian, it means "nonsense" according to google translate)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I would rather translate that as "wow". A picture is worth a thousand words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based the code on https://doc.rust-lang.org/reference/identifiers.html which says identifiers are ASCII only. Escape sequences work on stable as well: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7ad2018ba9899b4364a042eeb7dd4789
That means one has to interleave escape sequences and format specifier lexing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to support escaped sequences and Unicode literals

@matklad
Copy link
Member

matklad commented Apr 24, 2020

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 24, 2020

Build succeeded:

@Geobert
Copy link
Contributor

Geobert commented Apr 27, 2020

Running nightly 7a9ba16, I don't see this working. Do I need to add some configuration or configure their color?

@lnicola
Copy link
Member

lnicola commented Apr 27, 2020

Try

    "editor.tokenColorCustomizationsExperimental": {
        "attribute": "#ff0000"
    },

Is the HighlightTag::Attribute suitable for highlighting the {}?

My gut feeling is no 😄.

@Geobert
Copy link
Contributor

Geobert commented Apr 27, 2020

Thank it was that indeed, but I don't want to change my attributes color, so I agree with @lnicola , no, attribute is not suitable for {} ^^'

@ltentrup
Copy link
Contributor Author

I wasn't aware that it is possible to introduce new tokens, the follow-up PR #4183 implements this change.

bors bot added a commit that referenced this pull request Apr 28, 2020
4183: Introduce new semantic highlight token for format specifier r=matklad a=ltentrup

Follow up from #4006: Instead of using the `attribute` highlight token, introduce a new semantic token for format specifier.

Co-authored-by: Leander Tentrup <[email protected]>
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Jan 7, 2025
Fix rust-lang#3846 properly, so that subtrees can be skipped again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants