Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language field to CompletionContext #40541

Closed
mjbvz opened this issue Dec 20, 2017 · 10 comments
Closed

Add language field to CompletionContext #40541

mjbvz opened this issue Dec 20, 2017 · 10 comments
Assignees
Labels
api feature-request Request for new features or functionality under-discussion Issue is under discussion for relevance, priority, approach

Comments

@mjbvz
Copy link
Collaborator

mjbvz commented Dec 20, 2017

Problem
For file types with embedded language, such as jsx, html, and php, it is sometimes useful to know the embedded language from which completions have been requested. I believe that #40539 for example depends on this functionality

Proposal
Add a new language field to the CompletionContext:

export interface CompletionContext {
    ...

    readonly languageId?: string
}

Alternative Design
Expose a API that allows extensions to determine the embedded language at a given position in a file. Something like:

export interface TextDocument {
     ...

     languageIdAt(position: Position): string
}

// cc @ramya-rao-a for emmet, @roblourens for php, and @aeschli for html

@mjbvz mjbvz added api feature-request Request for new features or functionality labels Dec 20, 2017
@mjbvz mjbvz added the under-discussion Issue is under discussion for relevance, priority, approach label Dec 20, 2017
@aeschli
Copy link
Contributor

aeschli commented Dec 20, 2017

Giving out the language id in a competition context is, IMO, a fair request. But I'm not sure how useful it is as one often needs more context (am I in a tag, attribute, in a string, in a comment...) to really do something.

The second suggestion, being able to query for the language id of every offset, with a sync call is costlier: It would require tokenizing the document. This likely has to happen on the extension host as syncing that information over from the renderer is expensive. Also, the renderer doesn't have the tokens all the time (typically only for to the visible lines).
A bigger problem, however, is that the language information is TextMate grammar based. TextMate tokenization has its limits, due to the fact that grammars are regexp based and the text mate engine only see a line at a time and can not make changes to already emitted scopes.
However, language servers have precise scanners and parsers and also get the corner case right.

I had many discussions with @alexandrudima on this topic and I often lobbied to give out the embedded language id, but we always came to the conclusion that it will lead down a problematic path, putting more pressure on text mate grammars (that we don't control) and ending up with plenty of bugs due to corner cases that can't be fixed due to TextMates limitations.

Our investments should go into language servers and language servers extensibility.

@mjbvz
Copy link
Collaborator Author

mjbvz commented Dec 20, 2017

Yes, we've discussed a generic "inspect token" api a few times previously and I believe we've always settled on it being potentially too expensive

Making the language servers smarter is always an option but it would be overkill for features like #40539, especially if VS Code already has the language info internally. #40539 tracks making < a suggest trigger character for JSX code. The challenge is that we don't want to trigger suggestions after you type a normal, javascript, less-than operator, only when you are in jsx and are starting a tag. A few ways this could be implemented:

Using this proposal

  1. Add < as suggest trigger character for js.
  2. Only actually trigger suggestions if the embedded language is jsx-tags

Smarter TS (hacky)

  1. Add < as suggest trigger character for js.
  2. Make TS only return suggestions immediately after a < when in jsx

Smarter TS (better)

  1. Add < as suggest trigger character for js.
  2. Add API to TS to check if we are in jsx at a given position
  3. When triggering on <, query TS Server and only return suggestions when we know we are in jsx

Embedded language aware trigger characters

  1. Allow extensions to register embedded language specific completion trigger characters
  2. Use this API to register < as a trigger character for the jsx-tags language

@ramya-rao-a Weren't you running into a some limitations around embedded languages for emmet, or have those been worked around?

@octref
Copy link
Contributor

octref commented Dec 21, 2017

For Smarter TS (better)

When triggering on <, query TS Server and only return suggestions when we know we are in jsx

Can't TS server just return empty CompletionItem[] when triggering on < outside JSX?

It seems paradoxical to me -- if a server can support embedded language, it would already have the languageId info, which would be more correct than the one generated from TM, if the server has a proper parser.

I agree with @aeschli making LS behavior dependent on TM grammar doesn't sound right.

  • TM grammar can often be wrong with scopes when program is invalid (a natural state of writing code)
  • TM scopes are expensive to re-generate on each file change
  • There are a lot of not-so-good TM grammars there due to its under-documentedness.

But I get the overkill argument. One way might be making vscode-textmate easily consumable by LS, so people can write simple JSX LS that inefficiently do some simple completion for the easy of development.

@mjbvz
Copy link
Collaborator Author

mjbvz commented Dec 21, 2017

The TS server has no knowledge of what triggered completions. We do the gating on the extension side

On the second point: we already use the embedded language id for computing snippet completions, so I don't think this proposal will add very much overhead. If there are problems with the grammars, we should look into fixing these since some of our other basic language features depend on embedded language information already

@jrieken
Copy link
Member

jrieken commented Jan 3, 2018

This topic starts to feel like "Groundhog Day". Yes, we do use some of the "language at position"-logic for snippets but that information isn't in the extension host nor exposed in any kind of API. I see how the 'CompletionContext'-proposal keeps this local but it will also make folks ask for more. I'd favour the alternative approach but in the way @octref proposes, using the vscode-textmate module directly, not making this appear too much supported and most importantly make this a little more controlled as you can provide your own grammar for your own use-case.

@ramya-rao-a
Copy link
Contributor

@ramya-rao-a Weren't you running into a some limitations around embedded languages for emmet, or have those been worked around?

I ended up doing a lot of hacks on the text around the cursor which takes care of say 95% of scenarios.

How about using the language under the cursor to determine which completion providers to call instead of using the language of the document? I believe, when completions are triggered, the document is already tokenized up till current line. This way the token/language info remains local and extensions need not know about it

@jrieken @aeschli I remember bringing this up the last time we discussed this issue, but I don't recall what the conclusion was.

If we take this route, then @mjbvz you will be registering a completion provider for jsx-tags with trigger char as <

@mjbvz
Copy link
Collaborator Author

mjbvz commented Jan 3, 2018

Thanks for the info @ramya-rao-a!

With using the embedded language to select completion providers, I worry that could impact existing completion providers scenarios if we weren't careful. If we were to turn this on by default, our css completion provider for example would start being triggered inside of html style blocks or js styled-components. We'd need a way to explicitly opt-in a completion provider for embedded languages or a way to create a fake embedded language document that we hand back to the provider (which we've previously ruled out as being too complicated).

That's why I'm proposing this smaller API addition. Since #40539 is my main use case currently, the alternative would be a TS specific solution. The TS Server already has a document ast, so I don't think we should maintain another copy of that or a copy of the tokenized file in the TS extension. Rather, we'd probably want to make the tsserver smarter so that it understands trigger characters and just does the right thing when < is pressed

@jrieken
Copy link
Member

jrieken commented Jan 4, 2018

I remember bringing this up the last time we discussed this issue, but I don't recall what the conclusion was.

One more time: A language must not know that it gets embedded. Let me invent a new programming language that embeds any other language, my TM-grammar correctly spells out what portion of code is what language. It's not fair and illogical to let IntelliSense or any other feature redirect to the actual language. For instance, how would go-lang or cpp-lang know that it is embedded in foo-lang and how would it know what portions of a file to skip?

The solution is to turn it around. If foo-lang embeds other languages it must be able to understand them. It also means that we cannot ever redirect to the (TM-guessed) language at the current cursor position (event tho it sounds like the obvious solution).

I am willing to talk about using the textmate module in the extension host (not necessarily in the API, but as a good, easy to consume node-module). That hopefully makes it more obvious what we are dealing with: A bunch of regular expressions that assign parts of a text file semantics for visual purposes (syntax highlighting), not scanners, not syntax trees.

@jrieken
Copy link
Member

jrieken commented Jan 4, 2018

Since #40539 is my main use case currently, the alternative would be a TS specific solution. The TS Server already has a document ast, so I don't think we should maintain another copy of that or a copy of the tokenized file in the TS extension. Rather, we'd probably want to make the tsserver smarter so that it understands trigger characters and just does the right thing when < is pressed

Yes, TypeScript supports jsx/tsx. It should do it properly which includes tooling and part of that is to know when to auto suggest.

@mjbvz
Copy link
Collaborator Author

mjbvz commented Jan 4, 2018

Thanks all for the discussion

Here's my proposal to make TS trigger character aware: microsoft/TypeScript#21012

@mjbvz mjbvz closed this as completed Jan 4, 2018
@vscodebot vscodebot bot locked and limited conversation to collaborators Feb 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api feature-request Request for new features or functionality under-discussion Issue is under discussion for relevance, priority, approach
Projects
None yet
Development

No branches or pull requests

5 participants