Add language field to CompletionContext #40541

mjbvz · 2017-12-20T00:52:46Z

Problem
For file types with embedded language, such as jsx, html, and php, it is sometimes useful to know the embedded language from which completions have been requested. I believe that #40539 for example depends on this functionality

Proposal
Add a new language field to the CompletionContext:

export interface CompletionContext {
    ...

    readonly languageId?: string
}

Alternative Design
Expose a API that allows extensions to determine the embedded language at a given position in a file. Something like:

export interface TextDocument {
     ...

     languageIdAt(position: Position): string
}

// cc @ramya-rao-a for emmet, @roblourens for php, and @aeschli for html

The text was updated successfully, but these errors were encountered:

aeschli · 2017-12-20T12:05:02Z

Giving out the language id in a competition context is, IMO, a fair request. But I'm not sure how useful it is as one often needs more context (am I in a tag, attribute, in a string, in a comment...) to really do something.

The second suggestion, being able to query for the language id of every offset, with a sync call is costlier: It would require tokenizing the document. This likely has to happen on the extension host as syncing that information over from the renderer is expensive. Also, the renderer doesn't have the tokens all the time (typically only for to the visible lines).
A bigger problem, however, is that the language information is TextMate grammar based. TextMate tokenization has its limits, due to the fact that grammars are regexp based and the text mate engine only see a line at a time and can not make changes to already emitted scopes.
However, language servers have precise scanners and parsers and also get the corner case right.

I had many discussions with @alexandrudima on this topic and I often lobbied to give out the embedded language id, but we always came to the conclusion that it will lead down a problematic path, putting more pressure on text mate grammars (that we don't control) and ending up with plenty of bugs due to corner cases that can't be fixed due to TextMates limitations.

Our investments should go into language servers and language servers extensibility.

mjbvz · 2017-12-20T19:27:32Z

Yes, we've discussed a generic "inspect token" api a few times previously and I believe we've always settled on it being potentially too expensive

Making the language servers smarter is always an option but it would be overkill for features like #40539, especially if VS Code already has the language info internally. #40539 tracks making < a suggest trigger character for JSX code. The challenge is that we don't want to trigger suggestions after you type a normal, javascript, less-than operator, only when you are in jsx and are starting a tag. A few ways this could be implemented:

Using this proposal

Add < as suggest trigger character for js.
Only actually trigger suggestions if the embedded language is jsx-tags

Smarter TS (hacky)

Add < as suggest trigger character for js.
Make TS only return suggestions immediately after a < when in jsx

Smarter TS (better)

Add < as suggest trigger character for js.
Add API to TS to check if we are in jsx at a given position
When triggering on <, query TS Server and only return suggestions when we know we are in jsx

Embedded language aware trigger characters

Allow extensions to register embedded language specific completion trigger characters
Use this API to register < as a trigger character for the jsx-tags language

@ramya-rao-a Weren't you running into a some limitations around embedded languages for emmet, or have those been worked around?

octref · 2017-12-21T19:59:26Z

For Smarter TS (better)

When triggering on <, query TS Server and only return suggestions when we know we are in jsx

Can't TS server just return empty CompletionItem[] when triggering on < outside JSX?

It seems paradoxical to me -- if a server can support embedded language, it would already have the languageId info, which would be more correct than the one generated from TM, if the server has a proper parser.

I agree with @aeschli making LS behavior dependent on TM grammar doesn't sound right.

TM grammar can often be wrong with scopes when program is invalid (a natural state of writing code)
TM scopes are expensive to re-generate on each file change
There are a lot of not-so-good TM grammars there due to its under-documentedness.

But I get the overkill argument. One way might be making vscode-textmate easily consumable by LS, so people can write simple JSX LS that inefficiently do some simple completion for the easy of development.

mjbvz · 2017-12-21T20:06:16Z

The TS server has no knowledge of what triggered completions. We do the gating on the extension side

On the second point: we already use the embedded language id for computing snippet completions, so I don't think this proposal will add very much overhead. If there are problems with the grammars, we should look into fixing these since some of our other basic language features depend on embedded language information already

jrieken · 2018-01-03T12:00:00Z

This topic starts to feel like "Groundhog Day". Yes, we do use some of the "language at position"-logic for snippets but that information isn't in the extension host nor exposed in any kind of API. I see how the 'CompletionContext'-proposal keeps this local but it will also make folks ask for more. I'd favour the alternative approach but in the way @octref proposes, using the vscode-textmate module directly, not making this appear too much supported and most importantly make this a little more controlled as you can provide your own grammar for your own use-case.

ramya-rao-a · 2018-01-03T18:57:57Z

@ramya-rao-a Weren't you running into a some limitations around embedded languages for emmet, or have those been worked around?

I ended up doing a lot of hacks on the text around the cursor which takes care of say 95% of scenarios.

How about using the language under the cursor to determine which completion providers to call instead of using the language of the document? I believe, when completions are triggered, the document is already tokenized up till current line. This way the token/language info remains local and extensions need not know about it

@jrieken @aeschli I remember bringing this up the last time we discussed this issue, but I don't recall what the conclusion was.

If we take this route, then @mjbvz you will be registering a completion provider for jsx-tags with trigger char as <

mjbvz · 2018-01-03T19:17:11Z

Thanks for the info @ramya-rao-a!

With using the embedded language to select completion providers, I worry that could impact existing completion providers scenarios if we weren't careful. If we were to turn this on by default, our css completion provider for example would start being triggered inside of html style blocks or js styled-components. We'd need a way to explicitly opt-in a completion provider for embedded languages or a way to create a fake embedded language document that we hand back to the provider (which we've previously ruled out as being too complicated).

That's why I'm proposing this smaller API addition. Since #40539 is my main use case currently, the alternative would be a TS specific solution. The TS Server already has a document ast, so I don't think we should maintain another copy of that or a copy of the tokenized file in the TS extension. Rather, we'd probably want to make the tsserver smarter so that it understands trigger characters and just does the right thing when < is pressed

jrieken · 2018-01-04T11:17:24Z

I remember bringing this up the last time we discussed this issue, but I don't recall what the conclusion was.

One more time: A language must not know that it gets embedded. Let me invent a new programming language that embeds any other language, my TM-grammar correctly spells out what portion of code is what language. It's not fair and illogical to let IntelliSense or any other feature redirect to the actual language. For instance, how would go-lang or cpp-lang know that it is embedded in foo-lang and how would it know what portions of a file to skip?

The solution is to turn it around. If foo-lang embeds other languages it must be able to understand them. It also means that we cannot ever redirect to the (TM-guessed) language at the current cursor position (event tho it sounds like the obvious solution).

I am willing to talk about using the textmate module in the extension host (not necessarily in the API, but as a good, easy to consume node-module). That hopefully makes it more obvious what we are dealing with: A bunch of regular expressions that assign parts of a text file semantics for visual purposes (syntax highlighting), not scanners, not syntax trees.

jrieken · 2018-01-04T11:20:09Z

Since #40539 is my main use case currently, the alternative would be a TS specific solution. The TS Server already has a document ast, so I don't think we should maintain another copy of that or a copy of the tokenized file in the TS extension. Rather, we'd probably want to make the tsserver smarter so that it understands trigger characters and just does the right thing when < is pressed

Yes, TypeScript supports jsx/tsx. It should do it properly which includes tooling and part of that is to know when to auto suggest.

mjbvz · 2018-01-04T22:33:19Z

Thanks all for the discussion

Here's my proposal to make TS trigger character aware: microsoft/TypeScript#21012

mjbvz added api feature-request Request for new features or functionality labels Dec 20, 2017

mjbvz assigned jrieken and mjbvz Dec 20, 2017

mjbvz added the under-discussion Issue is under discussion for relevance, priority, approach label Dec 20, 2017

mjbvz closed this as completed Jan 4, 2018

vscodebot bot locked and limited conversation to collaborators Feb 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add language field to CompletionContext #40541

Add language field to CompletionContext #40541

mjbvz commented Dec 20, 2017 •

edited

Loading

aeschli commented Dec 20, 2017 •

edited

Loading

mjbvz commented Dec 20, 2017

octref commented Dec 21, 2017

mjbvz commented Dec 21, 2017

jrieken commented Jan 3, 2018

ramya-rao-a commented Jan 3, 2018

mjbvz commented Jan 3, 2018

jrieken commented Jan 4, 2018 •

edited

Loading

jrieken commented Jan 4, 2018

mjbvz commented Jan 4, 2018

Add language field to CompletionContext #40541

Add language field to CompletionContext #40541

Comments

mjbvz commented Dec 20, 2017 • edited Loading

aeschli commented Dec 20, 2017 • edited Loading

mjbvz commented Dec 20, 2017

octref commented Dec 21, 2017

mjbvz commented Dec 21, 2017

jrieken commented Jan 3, 2018

ramya-rao-a commented Jan 3, 2018

mjbvz commented Jan 3, 2018

jrieken commented Jan 4, 2018 • edited Loading

jrieken commented Jan 4, 2018

mjbvz commented Jan 4, 2018

mjbvz commented Dec 20, 2017 •

edited

Loading

aeschli commented Dec 20, 2017 •

edited

Loading

jrieken commented Jan 4, 2018 •

edited

Loading