-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support codemods: use a CST #12806
Comments
I think it's better to create your own instrument, we don't plan do it, sorry |
I don't understand. You're saying I should reinvent prettier? What I am inventing is a tool that would make it possible for prettier to support this. |
In particular what I haven't built yet but could offer is a
|
Actually, I think use a CST is a good idea, it solve many problems, eg comments in We already got the tokens, but not used yet. Line 34 in 7ab6198
|
One reason we are not using tokens #10492 (comment). |
Oh yeah. I don't think the memory costs of tokens should be too bad. There's a lot of places in this code that I could cache the token objects. For example there's no reason why every |
ESLint's |
It looks like a pretty big/complex API but I could try implementing some parts of it over the CST. I should also definitely change |
@fisker You forgot about many logic around parent nodes + parens + context printing, using CST is not simple for it (you don't know context of token - for example Also no good CST parser currently for JS (so there are a lot of work here). I don't mean to say that it's not possible to create good enough logic using CST, but it will take a lot of work. And prettier was not design for it. |
And yes, to keep memory is low we need a lot of weak caches between AST nodes and CST tokens and rebuild some stuff very ofter, it means we will lose perf. @conartist6 I want to ask one question - Why you want this behaviour? What is the end goal? |
@alexander-akait What I'm talking about isn't a pure CST, it's a CST purely inside a normal AST. How you know about your parents and siblings is no different than it would be when traversing any AST. If you're looking at how to print a particular node you have access to its tokens, but you also have access to the traversal path with your parent references, which is what you need to have the ability to "look around". |
I'm just saying I like the way how ESLint uses AST, tokens, and comments. We definitely don't want add tokens to Node(we've try to remove attaching comments to Node), but we can use tokens instead of originalText in some place. |
I want the behavior because I believe there should be first-class support for writing codemods. Your code is data, and it should be as easy as possible to write change it by writing a program. In particular you should probably never do a textual "find and replace" in your codebase. You'll end up changing things you didn't mean to. Even the simplest changes to a whole codebase need to be performed as modifications to ASTs, as that is the only way you can be sure what you are modifying. I think there's an incredible amount of potential lost currently due to the inability to do this well. Imagine this: as a library author I change the name of an exported function in new major version. Some of my users have imported the function in 1200 files. So that they can upgrade, I publish a codemod that:
In order to be able to make this kind of functionality valuable, I need to be able to avoid damaging the project's existing formatting. For hand-formatted projects this means respecting hand formatting. For prettier-formatted projects this means running prettier. Recast is what is supposed to step in and solve this problem for me, but since prettier forked from it it has been suffering from prettier's success. While prettier's printer has evolved to near perfection, the pretty-printer embedded in recast has become obsolete. Nobody wants to do the same work twice. That's why instead of trying to compete with prettier I want to integrate with it. |
@fisker Why do you definitely not want tokens on node? |
I don't think any of our team member interest in this. |
By first-class support I really just mean that there should be a I think it actually makes more sense to offer My point is this though: what would |
For the record though, my tool just helps you extrapolate tokens from the AST. It can overwrite properties of nodes, but it doesn't need to. It just helps you be able to define an algorithms in terms of tokens by ensuring that it can always generate them if they don't exist, and that it can always use them if they do. Prettier would most likely want to use |
Just for fun, here's what I imagine being possible. I'm hoping this helps make it clear why I think a mutable tree is actually a really great and powerful thing: import { readFile, writeFile } from 'fs/promises';
import { rebuildTokens, attachComments, print } from 'cst-tokens';
import prettier from 'prettier/cst';
export async function codemod(path, transform) {
const cst = someParser.parse(await readFile(path, 'utf-8'));
rebuildTokens(cst);
attachComments(cst);
transform(cst);
prettier(cst);
await writeFile(path, print(cst));
} |
The difference between this kind of mutating the AST and the kind of the mutations you're trying to avoid I think is that these are mutations expressed in terms of public APIs explicitly about AST mutation. They're not leaking bits of internal state out of prettier, they're the actual interface through which prettier does its work. I think such changes would make prettier more interoperable with other tools, not less. In the end, Prettier is changing your CST and your AST, and I don't see that it helps to hide that. In addition to codemodding this offers pretty good support for IDE integration. You can keep once CST and keep passing it back and forth from prettier to the editor like so since it continues to stay in sync:
The only problem that would be left unsolved is the precise definition of "Sync CST back to file". Right now you'd have to do what you have to do anyway: print the entire CST out and replace the entire file contents. I don't think this is so bad since, as noted, it's what has to happen already. Rome imagined a scenario where prettier would work with CSTs that have immutable nodes, and thus they could compare two CSTs and avoid reprinting parts of the tree that were unchanged. It's an interesting idea, but Rome is so perf-focused that they left the javascript ecosystem for Rust anyway, and I don't think the changes I'm suggesting actually rule out the possibility of later implementing reference-immutability if there is actually demand for it. |
My plan is to actually build this integration for a small subset of syntax (imports). That way we can discuss the specific mechanisms by which it works, and I can also show people exactly what workflows this integration would make possible. |
OK the challenges in implementing this are obviously... large. The best thing I could say about the possibility of actually building something is that the The question is: how would the builder that rebuilds CST tokens know which node's tokens it should be changing? The We do have an AST that is known to match the contents of the doc, in particular because prettier actually does implement formatting changes that would change the AST (e.g. I'm not sure quite how yet, but I'm pretty sure that with the parts I've built it should be possible to traverse the AST nodes guided by the content of the doc's input, while outputting to each node's tokens array. I need to figure out exactly how to allow the grammar to guide that process, and how all the parts would work together. If I can manage to put all the parts together without needing any modifications to the structure of the doc to support my use case, I think I have a decent chance of making this real. If I can't, game over. |
No, ok, I think I can do it all, and I don't need anything major from prettier. Here's exactly how I think the code will look: const __prettier = require('prettier').__debug;
const cstTokens = require('cst-tokens');
function codemod(sourceText, transform) {
const ast = __prettier.parse(sourceText);
cstTokens.updateTokens(ast, { sourceText });
// Run some arbitrary codemod
transform(ast);
cstTokens.updateRanges(ast);
return __prettier.formatAST(ast, { originalText: cstTokens.print(ast) });
} |
I am pleased to say that I've finished enough of the The next major steps will be to build out the grammar, build a test suite, and implement comment attachment. The cool thing is that in theory this data structure allows me to implement comment attachment logic in such a way that it could be reused by any library, including but not limited to |
Related issues: #2068, #4675, #5998, and my own earlier #12709.
Currently this is possible:
I do not like this workflow:
I have started work on implementing an alternate workflow:
node.tokens
to every node. Now you have a concrete syntax tree, or CST!In order to make this possible, I am building a library whose working name is cst-tokens. Its functionality centers around building and synchronizing concrete syntax stored in
node.tokens
arrays.My research indicates that this structure could be used to satisfy the kinds of queries prettier currently can only make against source text (i.e.
originalText
), though actually implementing those queries is still likely to be nontrivial.The text was updated successfully, but these errors were encountered: