Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further semantic tokens improvements #1641

Merged

Conversation

0dinD
Copy link
Contributor

@0dinD 0dinD commented Jan 10, 2021

Fixes redhat-developer/vscode-java#1753.
Fixes part of redhat-developer/vscode-java#1384.

New token types

  • javaModule namespace (for module names inside module-info.java)
  • javaClassLiteralKeyword keyword (Example: class in String.class)
    • This is especially helpful in vscode-java, since the TextMate grammar doesn't assign a scope for this.
  • javadocBlockTag keyword (@author, @param, @return etc.)
  • javadocInlineTag keyword (@code, @link, @value etc.)

Edit: the new token types were changed to more generic token types from LSP, see discussion with Eskibear.

New token modifiers

  • documentation (Applied to all tokens inside of Javadoc comments)
    • Apart from the new Javadoc tag tokens, tokens for parameters, types, methods etc. are now provided when referenced by @param, @link etc. in a Javadoc comment.

I refactored the semantic tokens visitor a little bit by adding more helper methods, which clean up the visitation logic and prevent code duplication. This also helps make the visitation logic easier to read and write, which has allowed me to spot and fix a few minor bugs in the previous logic. Notably, type arguments of method and constructor invocations weren't taken care of, so they were missing the typeArgument modifier.

A note about the tests: I mostly just added test cases for the new token types and modifiers, as well as the bugs I fixed. But in order to add a module-info.java to the test project, I had to make some changes. Mainly, you'll notice I converted the project from a simple Eclipse project to a Maven project. This was done in order to prevent the extension from freaking out about conflicting packages when there's an Eclipse project in the test projects folder that has a module-info.java on it's classpath. Looking at the only other test project with a module-info.java, I found out that by only having a pom.xml, and not a .classpath, the errors no longer appear. So in order to attach the test library for the semantic tokens project without creating a .classpath, I added a method in SemanticTokensCommandTest.

More details

This is what happened when adding the module-info.java:
issue
There are thousands of problems like this:
problems-list
And lots of errors like these in the log:

!ENTRY org.eclipse.jdt.core 4 4 2021-01-08 15:35:58.592
!MESSAGE Failed to find package fragment root for projects/maven/semantic-tokens/src/main/java (not open) [in org.eclipse.jdt.ls.tests]
!STACK 1
Java Model Exception: Java Model Status [projects/maven/semantic-tokens/src/main/java [in org.eclipse.jdt.ls.tests] does not exist]
  at org.eclipse.jdt.internal.core.JavaElement.newNotPresentException(JavaElement.java:573)
  at org.eclipse.jdt.internal.core.PackageFragmentRoot.getUnderlyingResource(PackageFragmentRoot.java:761)
  at org.eclipse.jdt.internal.core.SearchableEnvironment.findModuleContext(SearchableEnvironment.java:1047)
  at org.eclipse.jdt.internal.core.SearchableEnvironment.findType(SearchableEnvironment.java:480)
  at org.eclipse.jdt.internal.core.CancelableNameEnvironment.findType(CancelableNameEnvironment.java:63)
  at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.createPlainPackage(LookupEnvironment.java:1135)
  at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.buildTypeBindings(CompilationUnitScope.java:135)
  at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.buildTypeBindings(LookupEnvironment.java:475)
  at org.eclipse.jdt.internal.compiler.Compiler.internalBeginToCompile(Compiler.java:855)
  at org.eclipse.jdt.internal.compiler.Compiler.beginToCompile(Compiler.java:394)
  at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:1170)
  at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:712)
  at org.eclipse.jdt.core.dom.ASTParser.internalCreateAST(ASTParser.java:1205)
  at org.eclipse.jdt.core.dom.ASTParser.createAST(ASTParser.java:820)
  at org.eclipse.jdt.core.manipulation.CoreASTProvider$1.run(CoreASTProvider.java:272)
  at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
  at org.eclipse.jdt.core.manipulation.CoreASTProvider.createAST(CoreASTProvider.java:264)
  at org.eclipse.jdt.core.manipulation.CoreASTProvider.getAST(CoreASTProvider.java:197)
  at org.eclipse.jdt.ls.core.internal.handlers.CodeActionHandler.getASTRoot(CodeActionHandler.java:246)
  at org.eclipse.jdt.ls.core.internal.handlers.CodeActionHandler.getCodeActionCommands(CodeActionHandler.java:88)
  at org.eclipse.jdt.ls.core.internal.handlers.JDTLanguageServer.lambda$14(JDTLanguageServer.java:640)
  at org.eclipse.jdt.ls.core.internal.BaseJDTLanguageServer.lambda$0(BaseJDTLanguageServer.java:75)
  at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
  at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
  at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
  at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

If anyone has a better solution for working around what I assume might be a bug with this extension (or an upstream library), please tell me about it. Note that a similar issue will happen if you open a module-info.java not inside an Eclipse project, but only for the currently opened files in the editor. It will also disappear after restarting VS Code, so it's not as much of an issue as the other one. It can be reproduced with this module-info.java as well.

The last change I made to the tests was this one, which was necessary because I had to change the compiler compliance level to 11 now that the test project is modular. The change from class to namespace is because of the fact that in Java 9+ (due to the module system), the AST won't contain a recovery type binding for NonExistentClass, since the java package isn't accessible. Not sure if we should work around this in some way, or even what the expected result should be in this case.

If anyone's interested in why ModuleDeclaration#getJavadoc() always returns null (which is why I needed this workaround), it seems to be unimplemented in the ASTConverter of JDT Core. If support for resolving the Javadoc of a module declaration is added in the future, we can remove the workaround.

Copy link
Contributor

@Eskibear Eskibear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This PR fixes a lot of bugs.

Things look good:

  • support for module descriptor
  • fix for tags in comments like @link
  • cover edge case for modifer typeArgument
  • refactoring of the visitor, which looks easier to understand.

Things I don't like:

  • new token types for different keywords.

@@ -45,7 +45,8 @@
GENERIC("generic"),
TYPE_ARGUMENT("typeArgument"),
DECLARATION("declaration"),
IMPORT_DECLARATION("importDeclaration");
IMPORT_DECLARATION("importDeclaration"),
DOCUMENTATION("documentation");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation is one of the standard modifiers, which should be placed above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is deprecated and declaration. I didn't group the tokens based on whether they are standard in LSP, just Java modifiers (as in, keyword-based) vs "semantic" modifiers. Should I re-group them based on whether they are standard LSP tokens or custom jdt.ls tokens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead to re-group them. It's more clear, and later we might directly reference standard modifiers from LSP4J, and leave those Java specific ones here.

// For example, "module" typically has the same meaning as "namespace"
// in other languages (Rust, for example). But in Java, modules are a
// completely separate concept, packages being the "namespace" tokens.
MODULE("javaModule"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"module" is fine with me, as this is a Java language server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking for a while, or we can directly re-use namespace for modules. I don't see any case in Java to distinguish packages and modules, as they don't occur in the same file... both are some kind of namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"module" is fine with me, as this is a Java language server.

Well, the problem is that the token types from the jdt.ls will coexist with token types from other language servers. As a theme author, you won't necessarily know what language you're styling the tokens for (that's why LSP defines "standard", generic token types). So someone who doesn't know about Java modules might see that the token type module is available, and style it based on the expectation that it has the same meaning as namespace. By prefixing it with java, it becomes clear that this is a Java-specific concept, and is semantically different from, let's say, modules in Rust. I don't think it matters that much in this particular case, but out of principle I made the decision to prefix it. Here's a quote from aeschli about this from an earlier discussion:

If a custom token type or modifier is very language specific, it makes sense to prefix it that way: javaMember (extends member). But I don't think javaMember is a good example of a very 'java' specific concept.

This is also how the C# extension handles many language-specific token types, see their semanticTokenTypes contribution. Additionally, they even declare a module token type, which could potentially cause confusion with Java modules if they were not named javaModule. Although since the token type has no description I'm not entirely sure what it's used for anyway, especially since I'm no C# expert.

After thinking for a while, or we can directly re-use namespace for modules. I don't see any case in Java to distinguish packages and modules, as they don't occur in the same file... both are some kind of namespace.

They can appear in the same file, since there are directives in module-info which refer to both module names and package names. Just see the test case for modules and you'll see what I mean. Personally I might style packages and modules the same, but they are semantically different and I don't like the idea of taking away customization abilities from theme authors without good reason.

If you want me to rename javaModule to module or just namespace I'm personally fine with that, but I think you should consider the above first.

// in other languages (Rust, for example). But in Java, modules are a
// completely separate concept, packages being the "namespace" tokens.
MODULE("javaModule"),
TYPE_LITERAL_KEYWORD("javaClassLiteralKeyword"), // genericName taken from: https://docs.oracle.com/javase/specs/jls/se11/html/jls-15.html#jls-15.8.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, it's "class" part of "String.class". To highlight a "variable"-like "keyword", here:

  • adds a new token type
  • to avoid regression, clients are required to explicitly declare new types.
    It makes not too much sense to me. Suggest to reuse "keyword" token.

Copy link
Contributor Author

@0dinD 0dinD Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you've understood that correctly.

to avoid regression, clients are required to explicitly declare new types.
It makes not too much sense to me. Suggest to reuse "keyword" token.

It's not a regression because the class postfix keyword (which technically isn't really a keyword, so if you have a better name please suggest one) was not previously part of semantic highlighting. In fact it wasn't even part of the VS Code Java TextMate grammar, which is the main reason I wanted to add it via semantic highlighting. The only reason I scope-map it in redhat-developer/vscode-java#1760 is so that existing themes will see it as a keyword, which does actually change the colorization, but for the better in my opinion (previously it was treated as a field/property, from the TextMate grammar, but that doesn't make sense to me).

Using the keyword token is fine, but as I've said previously I don't know if I like the idea of making token types generalized when they can be specific. As far as the syntax highlighting in VS Code goes, I've already supertyped javaClassLiteralKeyword to keyword in redhat-developer/vscode-java#1760.

In some cases I do agree that being overly specific about token types is unnecessary, like differentiating between class, interface, enum keyword etc. But the class literal keyword is very different from, for example, the type declaration keywords or modifier keywords, because it's a postfix, which makes it seem like it's almost a field.

// completely separate concept, packages being the "namespace" tokens.
MODULE("javaModule"),
TYPE_LITERAL_KEYWORD("javaClassLiteralKeyword"), // genericName taken from: https://docs.oracle.com/javase/specs/jls/se11/html/jls-15.html#jls-15.8.2
JAVADOC_TAG("javadocBlockTag"), // genericName taken from: https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html#block-tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to reuse "keyword" token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this takes away the capability for theme authors to style block tags differently from inline tags, which I don't really like (and in this case I would personally like to style inline tags differently).

MODULE("javaModule"),
TYPE_LITERAL_KEYWORD("javaClassLiteralKeyword"), // genericName taken from: https://docs.oracle.com/javase/specs/jls/se11/html/jls-15.html#jls-15.8.2
JAVADOC_TAG("javadocBlockTag"), // genericName taken from: https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html#block-tags
JAVADOC_NESTED_TAG("javadocInlineTag"); // genericName taken from: https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html#inline-tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to reuse "keyword" token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes away capabilities from theme authors, see my other comments.

@Eskibear
Copy link
Contributor

I agree that on theme author's perspective, it's always good that LS expose more details (e.g granularity of token types, a generic keyword or a bunch of different keyword token types), then you'll be able to provide more flexibility and customization.

The question is how "specific" we are going to be. Taking keywords for example, we could invent specific token types for every keyword, e.g. "newKeyword", "forKeyword", "whileKeyword"...etc, then users could use whatever color customization they want for each keyword if they want. We need balance between generic and specific, and I do not see common requirements to add above types for the moment.

I do want to get this PR merged for those fixes and improvements, and for simplicity I suggest that:

  • In this PR we don't invent new token types, just using namespace for modules, and keyword for the others.
  • If you do want to have those new specific types, let's open a issue for open discussion first.

@0dinD 0dinD force-pushed the further-semantic-tokens-improvements branch from a9f0cba to 617a7ce Compare January 20, 2021 14:44
@0dinD
Copy link
Contributor Author

0dinD commented Jan 20, 2021

The question is how "specific" we are going to be. Taking keywords for example, we could invent specific token types for every keyword, e.g. "newKeyword", "forKeyword", "whileKeyword"...etc, then users could use whatever color customization they want for each keyword if they want. We need balance between generic and specific, and I do not see common requirements to add above types for the moment.

I think "newKeyword", "forKeyword", "whileKeyword" etc. is a bit too specific, but my point about the class literal "keyword" is that it's different from other keywords (being a postfix).

But I agree that we should take all of that in a separate discussion, so I've committed a change to revert the new tokens to the more generic LSP tokens, losing some information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@link highlighting broken when linking to class
3 participants