Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve semantic token modifiers #1539

Merged

Conversation

0dinD
Copy link
Contributor

@0dinD 0dinD commented Sep 5, 2020

This PR improves the semantic token modifiers in the following ways:

New token modifiers

  • transient
  • volatile
  • default
  • synchronized
  • native
  • strictfp
  • generic
  • typeArgument (Example: String and Integer in Map<String, Integer>)
  • importDeclaration (Every simple name inside of an import declaration)

New token types

  • modifier

Edit: some of the new modifiers were scrapped, see discussion with Eskibear

See this screenshot for an example using some of the new modifiers:

new-modifiers

Performance improvements

  • A bitmask is now used for the token modifiers, even before encoding.
    • Seems like an improvement over streaming them into an array, only to convert them back into a bitmask during encoding.
    • Makes it easier to combine token modifiers in steps, instead of doing it all at once.
  • The old implementation sometimes called SimpleName#resolveBinding() twice for the same node, this is now avoided.
  • An int[] is now used instead of List<Integer> for the encoded tokens, since the length of the array is known.

I do not have access to any fancy profiling tools, but from my testing the new implementation seems to be significantly faster than the old one, even with my additions to the token modifiers.

Note

The only thing this PR is missing right now is some new tests for the new modifiers, which I will write when I have time. In the meantime, please verify this PR and let me know if there's anything that needs to change.

One thing that I would like your input on is what to call the importDeclaration token modifier. Is this a good name? Some other ideas that I had were: imported, importTarget or importDeclarationTarget. There is no standard token modifier for this.

@fbricon
Copy link
Contributor

fbricon commented Sep 9, 2020

@0dinD importDeclaration is fine by me. @Eskibear can you please review this? I'm definitely not the semantic highlighting expert

Copy link
Contributor

@Eskibear Eskibear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. Performance improvement looks promising. Just wondering whether we have to add all above modifiers.


public class SemanticTokensLegend {
private final List<String> tokenTypes;
private final List<String> tokenModifiers;
public static final SemanticTokensLegend INSTANCE = new SemanticTokensLegend();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to simplify it by removing SemanticTokenManager. But here it's no longer lazy-loaded. Suggest to keep the original Initialization-on-demand_holder_idiom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the original purpose of SemanticTokenManager now. But I just thought it was a bit unnecessary after the changes I made, since it was only used to get the SemanticTokensLegend instance. I can re-implement the lazy-loading in SemanticTokensLegend instead.

But now that I think about it, is there any reason to keep the legend in-memory at all in Java-land? Couldn't we just create a new instance when requested by the client? Because the server never uses the legend, and the client only needs to request it once and then keep it in it's own memory. Does the server have any reason to keep holding a static instance of the legend then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to keep the legend in-memory at all in Java-land?

It's a good finding. Indeed, we don't have to have the Legend, as it simply holds all possible values of TokenType and TokenModifier.

Does the server have any reason to keep holding a static instance of the legend then?

When client talks to the server, Legend is their contract, and it was used to encode tokens. With consistency in mind, I just implemented in that way.

}
};
// Standard Java modifiers
PUBLIC("public"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we have to add all "standard Java modifiers", because ultimately we'll align modifiers with LSP3.16 as much as possible, via LSP4J.

And massive modifiers would be potentially anonying (we did receive such feedback). I'm thinking, do people actually want to know below information from highlighting?

transient, volatile, default, synchronized, native, strictfp

My suggestion is to only keep important modifiers you are actually consuming now, also to avoid future breaking changes (when aligning to LSP) as best as we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I do see your point here. The main purpose of this PR was to add the generic, typeArgument and importDeclaration modifiers. But I realised that adding the rest of the standard Java modifiers was easy and computationally cheap, so I thought "why not?".

I'm not sure I understand what you mean by "align modifiers with LSP3.16"? We still need to have some custom modifiers that are not part of LSP3.16, and I don't see why these would be any different. Regarding the "massive modifiers", I think it would be rare to actually see all of those modifiers used together in a token, since many of them are mutually exclusive, or at least rarely used together.

To be clear, I also think some of the standard Java modifiers are rarely going to be used, like strictfp for example. But I just didn't see any particular reason not to add the information, since it's trivial to check them in the same way as public, static, readonly etc. If you want me to remove the new "standard Java modifiers", I am also fine with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that strongly opposed to that (as you mentioned you don't see any slow-down), my point is just "why to add them if they are rarely going to be used". It's indeed trivial to check them, it may cost tiny but still some time to check...

I still prefer not to add them here. One day if we do consume those modifiers, it's also trivial to add them back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do see your point. I'll remove those modifiers from this PR in my next iteration, which I'll probably commit later this week. But can we keep the native modifier? I don't see much use for the other modifiers (transient, volatile, default, synchronized and strictfp), but native is something that I might actually use myself, as it's quite common to see when using native libraries.

@0dinD 0dinD force-pushed the semantictokens-modifier-improvements branch from 964cba6 to d4c4fd3 Compare September 15, 2020 17:41
@0dinD
Copy link
Contributor Author

0dinD commented Sep 15, 2020

I have now addressed the review comments, and added some tests for the new modifiers.

Additionally, I replaced the token assertion helper methods in SemanticTokensCommandTest with a new TokenAssertionHelper class using the builder pattern. It does the same thing, but is a lot nicer to work with since you don't have to figure out the line and column of every token. This is done by going through all provided tokens in order, and then asserting that the text at their position matches the expected text. Aside from making future tests easier to write, the assertion helper also provides more detailed error messages when the assertions fail.

@Eskibear what do you think of these changes? Have I missed anything in my new assertion helper? It's a big rewrite, but I think it's worth it in order to make the tests easier to write. I've already been able to add lots of new assertions with it that would have taken much more effort to write with the old assertion helper methods.

Copy link
Contributor

@Eskibear Eskibear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@0dinD
Copy link
Contributor Author

0dinD commented Sep 16, 2020

Great!
Before this is merged, perhaps we should discuss whether or not to make #1514 a part of this PR? If we use the second alternative I presented I could add the constructor modifier to this PR.
What do you think @fbricon @Eskibear?

@Eskibear
Copy link
Contributor

add the constructor token to this PR

A constructor is a special function, so for me it's more reasonable to use function as token type, and add a constructor modifier (if you cannot find a proper one in LSP pre-defined modifers).

Actually the idea doesn't make much sense to me. I don't see any use case where we want to distinguish constructor with function in color. E.g.

class Foo {
  public Foo() {
  }
}

Foo obj = new Foo(); // <--- You already know it's a constructor because of the `new` keyword before it.

@fbricon
Copy link
Contributor

fbricon commented Sep 16, 2020

While testing #1545, I noticed Java 15's sealed keyword is not supported. Can this PR fix it, or do you prefer I opened a separate ticket?
Screen Shot 2020-09-16 at 12 24 31 PM

@Eskibear
Copy link
Contributor

Semantic highlighting doesn't cover keywords. Similar with record in Java 14, I'm afraid it has to be fixed in https://github.com/atom/language-java

@Eskibear
Copy link
Contributor

just opened a ticket atom/language-java#232

@0dinD
Copy link
Contributor Author

0dinD commented Sep 16, 2020

@Eskibear I meant to say "constructor modifier", but I do agree that constructors should be considered methods rather than types. My point was that other users might disagree, and prefer to style constructors based on their type instead. By making it a modifier on the types, both sides of the argument could get what they want, either styling it via the type or just like a function with *.constructor.

But since I don't want the feature myself, and since it could create some other problems, I think I'll scrap the idea for now. If we get a feature request for it in the future, I know how to implement it.

@fbricon I haven't looked at the Eclipse JDT update for Java 15 yet, buy I'm pretty sure it should be possible by adding a new token type for keywords. That's something I was going to look at in a future PR, this one is just for some new token modifiers. But if you want me to add this feature to this PR, I could also do that.

@Eskibear I'm quite sure that there is a standard token type for keywords, buy the problem is that the Eclipse AST doesn't seem to report the document position of most keywords. However, there is an AST node for modifier keywords, so I'm pretty sure the sealed keyword is possible to implement.

@Eskibear
Copy link
Contributor

I think I'll scrap the idea for now. If we get a feature request for it in the future, I know how to implement it.

👍

AST doesn't seem to report the document position of most keywords.

Yes, that's one of the reasons we didn't highlight keywords here. Ideally, "static" content like keywords can be coverred by TextMate-based syntax highlighting (because it's cheap), and semantic highlighting is responsible for cases not coverred by those regular expressions, mainly for identifiers (of variables, classes, methods..etc).

I'm ok that we provide more tokens (e.g. some keywords) if it's possible, as long as it does no harm to the performance.

@0dinD
Copy link
Contributor Author

0dinD commented Sep 16, 2020

As far as performance goes, I think it's going to be pretty cheap. The main bottleneck with semantic highlighting seems to lie in computing the AST, but all we have to do in order to highlight keywords is to visit some additional nodes. And I've already improved the performance via this PR, so we have some more leeway in terms of smaller improvements like this.

@0dinD
Copy link
Contributor Author

0dinD commented Sep 16, 2020

@fbricon is this PR ready to merge now or should I add the tokens for modifier keywords as well?

@fbricon
Copy link
Contributor

fbricon commented Sep 27, 2020

@0dinD is you can easily add support for the new Java 15 keywords, then please go ahead

@0dinD
Copy link
Contributor Author

0dinD commented Sep 27, 2020

Alright, I'll add the modifier keyword token type (and fix the conflicts) on Monday then.

@fbricon
Copy link
Contributor

fbricon commented Sep 27, 2020

see atom/language-java@d99f040 if it helps

@0dinD
Copy link
Contributor Author

0dinD commented Sep 27, 2020

Thanks, but the semantic highlighting feature only depends upon the AST from Eclipse JDT. I've already checked that the new keywords are included in the AST, so implementing this token type should be quite trivial.

@fbricon
Copy link
Contributor

fbricon commented Sep 28, 2020

I noticed there's also a new non-sealed keyword in JEP 360. See also atom/language-java#236

@0dinD 0dinD force-pushed the semantictokens-modifier-improvements branch 2 times, most recently from daf610c to 444dad8 Compare September 28, 2020 18:06
@0dinD
Copy link
Contributor Author

0dinD commented Sep 28, 2020

@fbricon yes, I've noticed that as well. Both the sealed and the non-sealed keywords are easy to implement in semantic highlighting, with the Modifier AST node. yield and permits are going to be harder since they are not their own AST nodes, and as such, their exact location is not available to the AST visitor. I have some workarounds in mind for adding all of the different keywords, but I think it deserves its own PR since it's going to require some more careful consideration and testing. (Also, I don't have time to work on it right now.)

I've now fixed the conflicts and added the modifierKeyword token type. It will look bad right now as it's a custom token with no styling, but that can be fixed with a PR similar to redhat-developer/vscode-java#1540. I will submit that PR soon.

Also, I didn't have time to set up JDK15 today, so please verify that sealed and non-sealed have semantic tokens. I would be very surprised if it doesn't work though, as the Javadoc for Modifier clearly mentions them.

@fbricon
Copy link
Contributor

fbricon commented Sep 29, 2020

@Eskibear any chance you can test this today? I'm planning on a release tomorrow

@fbricon fbricon removed this from the End September 2020 milestone Sep 29, 2020
@Eskibear
Copy link
Contributor

I'm wondering whether we should directly use modifier type (a standard type in LSP 3.16) instead of creating a new modifierKeyword type. If so, we don't bother to declear the token in downstream Java extension.

FYI, Related discussions:
modifier: microsoft/language-server-protocol#968 (comment)
modifierKeyword: microsoft/vscode#97063

@0dinD
Copy link
Contributor Author

0dinD commented Sep 30, 2020

@Eskibear oh, I didn't realise it was part of LSP, and only bothered to look at the VS Code docs. I think we should use modifier then, but the scope mapping would be necessary until that token has been added to VS Code. Currently it's just part of the LSP spec.

Although it is still a bit unclear to me whether or not modifier or modifierKeyword will be part of the official implementation, from the issues you linked.

@Eskibear
Copy link
Contributor

I think we should use modifier then, but the scope mapping would be necessary until that token has been added to VS Code.

Looks good. In LSP3.16, now there are modifier and keyword types. To me, the first is for "modifier keywords", and the second is for other keyword tokens.

As for the type modifierKeyword, I think we can just keep an eye on the discussion.

@0dinD 0dinD force-pushed the semantictokens-modifier-improvements branch from 879e6ed to 64965f3 Compare September 30, 2020 08:28
@0dinD 0dinD force-pushed the semantictokens-modifier-improvements branch from 64965f3 to f0e83fc Compare September 30, 2020 08:34
@0dinD
Copy link
Contributor Author

0dinD commented Sep 30, 2020

@Eskibear @fbricon I updated this PR and redhat-developer/vscode-java#1636, modifierKeyword --> modifier.

@fbricon
Copy link
Contributor

fbricon commented Oct 13, 2020

@Eskibear can you please review this one?

Copy link
Contributor

@Eskibear Eskibear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fbricon fbricon added this to the Mid October 2020 milestone Oct 14, 2020
@fbricon fbricon merged commit 0939bd1 into eclipse-jdtls:master Oct 14, 2020
@fbricon
Copy link
Contributor

fbricon commented Oct 14, 2020

Thanks @0dinD!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants