Syntax highlighting for format strings #4006

ltentrup · 2020-04-17T07:54:23Z

I have an implementation for syntax highlighting for format string modifiers {}.
The first commit refactors the changes in #3826 into a separate struct.
The second commit implements the highlighting: first we check in a macro call whether the macro is a format macro from std. In this case, we remember the format string node. If we encounter this node during syntax highlighting, we check for the format modifiers {} using regular expressions.

There are a few places which I am not quite sure:

Is the way I extract the macro names correct?
Is the HighlightTag::Attribute suitable for highlighting the {}?

Let me know what you think, any feedback is welcome!

lnicola · 2020-04-17T07:58:37Z

I guess it doesn't fix it, but CC #3419.

ltentrup · 2020-04-17T08:00:22Z

Thanks, I forgot to include a link! It is related, but for actually checking the syntax it is probably needed to create a separate parser, which this PR does not do.

crates/ra_ide/src/syntax_highlighting.rs

matklad · 2020-04-17T09:08:35Z

crates/ra_ide/src/syntax_highlighting.rs

+                    static ref RE: Regex =
+                        Regex::new(r"[^\{](?P<format>\{(?:\}|[^\{].*?\}))[^\}]").unwrap();
+                }
+                for (start, end) in RE


Rather then using ad-hoc regex parsing, I suggest that we install some future proofing here. Eventually, we'd want to handle and color escape sequences properly, as well as check the semantics of format specifiers inside {}.

So I think we need a new function, lex_string_literal, which returns inner "tokens" comprising the string. I think it should live in ra_syntax/ast/tokens.rs as a method on ast::String. Ideally, it should have the following return type

enum StrngPiece { Quote, LiteralSequence, EscapeSequence, FormatSpecifier, } fn lex(&self) -> impl Iterator<Item = (TextRange, StringPIece)> { ... }

We, however, already use internal iteration for escape sequences (see the call to unescape_str), so, instead of returning an iterator, we might just accept the function with (TextRange, StringPIece) argument. I don't think we necessary need to handle escape sequences in this PR (although we might do that as well, they also should be colored), but I think it's important to properly isolate string lexer into a separate function, becase we'll have to extend it eventually to handle everything. Oh, and I think it's better to avoid regex here and code everything in a manual way, as we'd need that for escape sequences anyway.

Thanks to your detailed description it was straightforward to implement 👍
I have updated the PR to include the manual lexer (for format specifier and escape sequences). I have also included some tests for the happy path, in case of an error I opted to just return the neutral StringPiece::LiteralSequence.
The current implementation does not support RawStrings, though.

matklad · 2020-04-18T18:00:38Z

crates/ra_syntax/src/ast/tokens.rs

+                            // unicode escape
+                            chars.next();
+                            let mut cloned = chars.clone().take(8); // up to 6 digits + opening `{` and closing `}`
+                            if let Some(next_char) = cloned.next() {


Sorry, I wasn't clear about this, we already have logic for this implemented in rustc_lexer:

https://github.com/rust-lang/rust/blob/52fa23add6fb0776b32cc591ac928618391bdf41/src/librustc_lexer/src/unescape.rs#L207-L260

So, rather than re-doing this, we should use the unscape_xxx family of funcitions.

The unescape function cannot be used in a compositional way together with the format specifier so I decided to focus in this PR on the format specifier. In a follow-up it should be easy to include escape sequences by using the unescape functions and a second iteration over the string as escape sequences and format specifier should have mutual exclusive ranges (and the sorting of syntax ranges was implemented in #4022).

Detailed changes: 1) Implement a lexer for string literals that divides the string in format specifier `{}` including the format specifier modifier. 2) Adapt syntax highlighting to add ranges for the detected sequences. 3) Add a test case for the format string syntax highlighting.

crates/ra_ide/src/syntax_highlighting.rs

crates/ra_syntax/src/ast/tokens.rs

crates/ra_ide/src/syntax_highlighting.rs

bjorn3 · 2020-04-20T09:36:35Z

crates/ra_syntax/src/ast/tokens.rs

+                            // integer
+                            read_integer(&mut chars, initial_len, callback);
+                        }
+                        'a'..='z' | 'A'..='Z' | '_' => {


Do format specifiers allow unicode identifiers on nightly?

I believe they are actually allowed even on stable (which can be considered a bug):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d9ec6e983b041ee8937c6b6900734480

(for those who can't read Russian, it means "nonsense" according to google translate)

Hm, I would rather translate that as "wow". A picture is worth a thousand words.

I based the code on https://doc.rust-lang.org/reference/identifiers.html which says identifiers are ASCII only. Escape sequences work on stable as well: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7ad2018ba9899b4364a042eeb7dd4789
That means one has to interleave escape sequences and format specifier lexing.

I have updated the PR to support escaped sequences and Unicode literals

Co-Authored-By: bjorn3 <[email protected]>

…nicode identifiers

matklad · 2020-04-24T20:08:22Z

bors r+

bors · 2020-04-24T20:17:51Z

Build succeeded:

Geobert · 2020-04-27T12:40:56Z

Running nightly 7a9ba16, I don't see this working. Do I need to add some configuration or configure their color?

lnicola · 2020-04-27T12:47:54Z

Try

    "editor.tokenColorCustomizationsExperimental": {
        "attribute": "#ff0000"
    },

Is the HighlightTag::Attribute suitable for highlighting the {}?

My gut feeling is no 😄.

Geobert · 2020-04-27T12:51:26Z

Thank it was that indeed, but I don't want to change my attributes color, so I agree with @lnicola , no, attribute is not suitable for {} ^^'

ltentrup · 2020-04-28T07:48:16Z

I wasn't aware that it is possible to introduce new tokens, the follow-up PR #4183 implements this change.

4183: Introduce new semantic highlight token for format specifier r=matklad a=ltentrup Follow up from #4006: Instead of using the `attribute` highlight token, introduce a new semantic token for format specifier. Co-authored-by: Leander Tentrup <[email protected]>

Fix rust-lang#3846 properly, so that subtrees can be skipped again

matklad reviewed Apr 17, 2020

View reviewed changes

Refactor flattening logic for highlighted syntax ranges

29a8464

ltentrup force-pushed the highlight-format branch from f08c96f to b9c3885 Compare April 18, 2020 13:04

matklad reviewed Apr 18, 2020

View reviewed changes

ltentrup force-pushed the highlight-format branch from b9c3885 to ac798e1 Compare April 20, 2020 09:19

bjorn3 reviewed Apr 20, 2020

View reviewed changes

crates/ra_ide/src/syntax_highlighting.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Apr 20, 2020

View reviewed changes

crates/ra_ide/src/syntax_highlighting.rs Outdated Show resolved Hide resolved

crates/ra_syntax/src/ast/tokens.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Apr 20, 2020

View reviewed changes

crates/ra_ide/src/syntax_highlighting.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Apr 20, 2020

View reviewed changes

Apply suggestions from code review

b2829a5

Co-Authored-By: bjorn3 <[email protected]>

ltentrup force-pushed the highlight-format branch from 5e30df7 to b2829a5 Compare April 22, 2020 08:18

Adapt format specifier highlighting to support escaped squences and u…

445052f

…nicode identifiers

bors bot merged commit 51a0058 into rust-lang:master Apr 24, 2020

Veetaha mentioned this pull request Apr 25, 2020

Format string literal syntax highlighting is broken for escapes and interpolations simultaneously #4138

Closed

ltentrup deleted the highlight-format branch April 28, 2020 07:21

ltentrup mentioned this pull request Apr 28, 2020

Introduce new semantic highlight token for format specifier #4183

Merged

lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Jan 7, 2025

Merge pull request rust-lang#4006 from JoJoDeveloping/tb-fix-3846-retag

afd9ad4

Fix rust-lang#3846 properly, so that subtrees can be skipped again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax highlighting for format strings #4006

Syntax highlighting for format strings #4006

ltentrup commented Apr 17, 2020

lnicola commented Apr 17, 2020

ltentrup commented Apr 17, 2020

matklad Apr 17, 2020

ltentrup Apr 18, 2020

matklad Apr 18, 2020

ltentrup Apr 20, 2020

bjorn3 Apr 20, 2020

matklad Apr 21, 2020

bjorn3 Apr 21, 2020

matklad Apr 21, 2020

ltentrup Apr 21, 2020

ltentrup Apr 22, 2020

matklad commented Apr 24, 2020

bors bot commented Apr 24, 2020

Geobert commented Apr 27, 2020

lnicola commented Apr 27, 2020 •

edited

Loading

Geobert commented Apr 27, 2020

ltentrup commented Apr 28, 2020

Syntax highlighting for format strings #4006

Syntax highlighting for format strings #4006

Conversation

ltentrup commented Apr 17, 2020

lnicola commented Apr 17, 2020

ltentrup commented Apr 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matklad commented Apr 24, 2020

bors bot commented Apr 24, 2020

Geobert commented Apr 27, 2020

lnicola commented Apr 27, 2020 • edited Loading

Geobert commented Apr 27, 2020

ltentrup commented Apr 28, 2020

lnicola commented Apr 27, 2020 •

edited

Loading