-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax highlighting for format strings #4006
Conversation
I guess it doesn't fix it, but CC #3419. |
Thanks, I forgot to include a link! It is related, but for actually checking the syntax it is probably needed to create a separate parser, which this PR does not do. |
static ref RE: Regex = | ||
Regex::new(r"[^\{](?P<format>\{(?:\}|[^\{].*?\}))[^\}]").unwrap(); | ||
} | ||
for (start, end) in RE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather then using ad-hoc regex parsing, I suggest that we install some future proofing here. Eventually, we'd want to handle and color escape sequences properly, as well as check the semantics of format specifiers inside {}
.
So I think we need a new function, lex_string_literal
, which returns inner "tokens" comprising the string. I think it should live in ra_syntax/ast/tokens.rs
as a method on ast::String
. Ideally, it should have the following return type
enum StrngPiece {
Quote,
LiteralSequence,
EscapeSequence,
FormatSpecifier,
}
fn lex(&self) -> impl Iterator<Item = (TextRange, StringPIece)> { ... }
We, however, already use internal iteration for escape sequences (see the call to unescape_str
), so, instead of returning an iterator, we might just accept the function with (TextRange, StringPIece)
argument. I don't think we necessary need to handle escape sequences in this PR (although we might do that as well, they also should be colored), but I think it's important to properly isolate string lexer into a separate function, becase we'll have to extend it eventually to handle everything. Oh, and I think it's better to avoid regex here and code everything in a manual way, as we'd need that for escape sequences anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to your detailed description it was straightforward to implement 👍
I have updated the PR to include the manual lexer (for format specifier and escape sequences). I have also included some tests for the happy path, in case of an error I opted to just return the neutral StringPiece::LiteralSequence
.
The current implementation does not support RawStrings, though.
f08c96f
to
b9c3885
Compare
crates/ra_syntax/src/ast/tokens.rs
Outdated
// unicode escape | ||
chars.next(); | ||
let mut cloned = chars.clone().take(8); // up to 6 digits + opening `{` and closing `}` | ||
if let Some(next_char) = cloned.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't clear about this, we already have logic for this implemented in rustc_lexer:
So, rather than re-doing this, we should use the unscape_xxx
family of funcitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unescape function cannot be used in a compositional way together with the format specifier so I decided to focus in this PR on the format specifier. In a follow-up it should be easy to include escape sequences by using the unescape functions and a second iteration over the string as escape sequences and format specifier should have mutual exclusive ranges (and the sorting of syntax ranges was implemented in #4022).
Detailed changes: 1) Implement a lexer for string literals that divides the string in format specifier `{}` including the format specifier modifier. 2) Adapt syntax highlighting to add ranges for the detected sequences. 3) Add a test case for the format string syntax highlighting.
b9c3885
to
ac798e1
Compare
crates/ra_syntax/src/ast/tokens.rs
Outdated
// integer | ||
read_integer(&mut chars, initial_len, callback); | ||
} | ||
'a'..='z' | 'A'..='Z' | '_' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do format specifiers allow unicode identifiers on nightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they are actually allowed even on stable (which can be considered a bug):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for those who can't read Russian, it means "nonsense" according to google translate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would rather translate that as "wow". A picture is worth a thousand words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based the code on https://doc.rust-lang.org/reference/identifiers.html which says identifiers are ASCII only. Escape sequences work on stable as well: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7ad2018ba9899b4364a042eeb7dd4789
That means one has to interleave escape sequences and format specifier lexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to support escaped sequences and Unicode literals
Co-Authored-By: bjorn3 <[email protected]>
5e30df7
to
b2829a5
Compare
…nicode identifiers
bors r+ |
Build succeeded: |
Running nightly 7a9ba16, I don't see this working. Do I need to add some configuration or configure their color? |
Try "editor.tokenColorCustomizationsExperimental": {
"attribute": "#ff0000"
},
My gut feeling is no 😄. |
Thank it was that indeed, but I don't want to change my attributes color, so I agree with @lnicola , no, |
I wasn't aware that it is possible to introduce new tokens, the follow-up PR #4183 implements this change. |
4183: Introduce new semantic highlight token for format specifier r=matklad a=ltentrup Follow up from #4006: Instead of using the `attribute` highlight token, introduce a new semantic token for format specifier. Co-authored-by: Leander Tentrup <[email protected]>
Fix rust-lang#3846 properly, so that subtrees can be skipped again
I have an implementation for syntax highlighting for format string modifiers
{}
.The first commit refactors the changes in #3826 into a separate struct.
The second commit implements the highlighting: first we check in a macro call whether the macro is a format macro from
std
. In this case, we remember the format string node. If we encounter this node during syntax highlighting, we check for the format modifiers{}
using regular expressions.There are a few places which I am not quite sure:
HighlightTag::Attribute
suitable for highlighting the{}
?Let me know what you think, any feedback is welcome!