-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[semantic] proposals for new standard semantic token types #97063
Comments
Types
Modifiers
All extra tokens&types defined by rust-analyzer: https://github.com/rust-analyzer/rust-analyzer/blob/9cb55966fe0fee791072f275ac55b90b8ee13e32/editors/code/package.json#L522-L572 Hm, actually, |
While I think it makes sense to add in things like As a result, I'm wondering: should the resulting LSP semantic tokens formal specification also include some guidance on which token types and modifier types should be considered as important to have different colors? To sort of establish a baseline on what a theme is expected to cover. Or would the expectation really be that all theme authors pick something for a relatively specific thing like a While this doesn't directly matter to the protocol implementation on either side of course, I feel like it could probably be pretty relevant for how it all plays together in the end to give some guidance for theme authors here. |
Note that "documentation" exists today, but without any documentation as to when it's to be used, and what semantic meaning it has, so this is a proposal that
References: |
Should there be a token type for things like TypeScript's decorators (Dart has similar called annotations): // TypeScript
function foo() {}
@foo
function bar() {} // Dart
@mustCallSuper
void foo() {} I don't think any of the existing ones fit? |
Yes, I agree that a token type |
We have been talking about |
rust-analyzer also provides something similar (called The |
There are types for |
Hate to resurrect a stale comment thread, but hey, how bout that decorator/attribute/annotation token. :) I love semantic highlighting but it's killing me that the |
@aeschli do you have any plans to extend this in VS Code? |
What's the status on the I think some coordination is required between LSP and VS Code here, to make sure that standard LSP token types are also standard in VS Code, as well as agreeing on a name ( |
annotation, builtinType, typeAlias, union mentioned above would all be useful for clangd (C++). unresolved or maybe "unknown" too, and I think it should be a type rather than a modifier. (For those familiar with C++ templates, dependent names could be modeled as a modifier, and their tokens would be either Type+DependentName or Unknown+DependentName) |
What do people think of modifiers for scope? Maybe function/class/module/global
These are loose, but distinguishing global variables from function-locals at a glance seems pretty useful! |
modifiers for scope would be useful. RustAnalyzer has some custom types that somewhat work along those lines:
Rust doesn't have global in the same way, but the same spectrum of types applies. |
Slightly related (though not sure if these should be types or modifiers):
|
I added a new type |
@aeschli should |
Added it. |
@aeschli I had a request for additional modifiers so that a theme author can customise colours of some keywords specifically: It feels awkward to provide a modifier for each language keyword - are there any guidelines on how fine-grain these should be? Would it be a reasonable/feasible VS Code feature request to allow the text content of a token to be used by theme authors? (for ex. |
@DanTup What I generally try to do is use the TextMate grammar to map out most of the syntax, and only use semantic tokens to give semantic meaning to identifiers (e.g., to distinguish between a class, an interface, and a type alias — something that you can't really do without parsing the source code). Keywords are trivial to catch with a regular expression, and then you can just use a back-reference to insert the matched text into the TM scope: {
"match": "\\b(if|else|switch|case|for|while|break)\\b",
"name": "keyword.control.$1.languageid",
} Then a theme author could use, e.g., "keyword.control.for" to make that specific keyword its own color if they really wanted to. Marking up the entire syntax with semantic tokens is something I would try to avoid personally (or hide behind a configuration flag if you need to provide those tokens for editors other than VS Code), because VS Code treats semantic tokens a bit like an ID selector in CSS, which does really limit the flexibility of theme authors and end users to customize the syntax colors in a granular way. |
@dannymcgee I don't think adding configuration to the server to produce a reduced set of tokens would be a good fit here. It would mean the server has to have some knowledge of the specific client and its textmate grammar (which may change over time). I'd prefer to add additional modifiers than that, but I was hoping there could be a better way (themes are the sort of things people really like to make their own, so being able to customise some specific tokens without the servers needing to mark them all up individually seems like a powerful feature). |
@DanTup Currently we need all semantic token types and modifiers to be known beforehand. So yes, there's no alternative to list them. |
Couldn't you just use a simple toggle that either a) tokenizes everything or b) tokenizes only identifiers? (The latter is the option I would personally prefer as an end user.) It doesn't require any knowledge of the specific grammar (or even the specific client), just a general idea that certain clients may be supplementing the semantic tokens with some other tokenizer, so they only need specification of semantic (as opposed to syntactic) information. For what it's worth, it wouldn't be without precedent — that's how the TypeScript implementation works, and Rust Analyzer has an option to skip tokenizing strings. (But no pressure, obviously, it is your project. 🙂) |
If I'm understanding you correctly, the |
I don't think so - the semantic tokens are adding more value than just identifiers. There are a lot of things that are complicated to handle 100% accurately in the textmate grammar (expressions in string interpolation can include keywords, for example, and documentation comments can include full code blocks). Even with a built-in toggle, it seems like assumptions would have to be made about what the client is otherwise colouring, and unless/until LSP allowed us to provide the textmate grammar to the client, that's something I'd prefer not to make assumptions about (at least, not for something minor like a small number of users wanting to customise colours of a few specific keywords). My real question is really about how fine-grain these tokens/modifiers can/should be. I can easily handle this by just adding a custom modifier for every keyword (we already have a lot of custom modifiers to help theming), and that feels better to me that producing a restricted set of tokens - but I don't think it's as good as VS Code having more flexible built-in theming (since anything I do specifically for my language will not necessarily be consistent with other languages). |
So I have my own custom theme that uses a mapping from semantic token -> textmate tokens so that I can write my theme entirely semantically and have it work on non-semantic languages automatically. For the most part the semantic tokens cover most things I've come across however there are a few semantic token types that would be helpful as quite a few tokens simply have no corresponding semantic token to denote them. Of note is a lack of semantic tokens for HTML/XML like tokens (semantically I don't feel the existing tokens cover any of these even if some could be contrived like class<->tag):
From adding rules for JS, I found of particular help distinguishing would be:
|
@Jamesernator Thanks a lot for sharing! |
I don not have a clear idea of what kind of modifier to add, Something like @DanTup suggested here seems an option to m, but I#m far from havin a deep understanding. Trying to create I theme though I found that the scope of variable.defaultLibrary in JS and TS ist very broad and overrides quite a lot, probably other *.defaultLibrary in various languages do too. I'd guess that I'm not the only one who would like to give visual preference for certain built in constructs over others. I came upon this, when I tried to give special emphasis to to console which by nature has (for me) a very different scope and use than in built constants like Math. |
How about Command Arguments (alternative names could be bare quote strings or generic tokens)? |
The new semantic token provider API comes with a list of standard token types and modifiers.
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-classification
These type serve as a base across languages and by having all/most providers using theme will make easier to write theming rules across languages.
That said, semantic token providers are not forced to stick to the standard, but can add new types/modifiers, or extend existing types as seen in the doc.
This issue is to collect proposals for new types and modifiers. When making a suggestion, please add a description and a small code sample. If it exists, name the corresponding TextMate scope.
The standard token types should be be applicable across multiple languages and be useful for theming. We want to keep the set of standard tokens consistent and coherent.
Proposed types:
Proposed modifiers:
References:
(1) microsoft/language-server-protocol#968
(2) microsoft/vscode-languageserver-node#604
The text was updated successfully, but these errors were encountered: