-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize WCC to only use Acorn when necessary #188
Comments
Can you show an example in the description of the before / after you're envisioning here? Regex seems kind of brittle when thinking about parsing an entire JS file for things like fwiw, it would be lovely to not have to use Acorn, since then we have to rely on it to handle anything a user could throw at it, like say a TypeScript or CoffeeScript file, without us having to care, and so that would certainly help loosen things up for WCC and make this possible for sure - #122 Just having a hard time seeing how Regex gives us the same amount of confidence as an AST though? 🤔 |
Sure, these comparisons are from 0.16.0 vs https://github.com/ProjectEvergreen/wcc/tree/enhancement/optimization-reducing-dependencies Current method for registering dependencies: function registerDependencies(moduleURL, definitions, depth = 0) {
const moduleContents = fs.readFileSync(moduleURL, 'utf-8');
const result = transform(moduleContents, {
transforms: ['typescript', 'jsx'],
jsxRuntime: 'preserve'
});
const nextDepth = depth += 1;
const customParser = getParser(moduleURL);
const parser = customParser ? customParser.parser : acorn.Parser;
const config = customParser ? customParser.config : {
...walk.base
};
walk.simple(parser.parse(result.code, {
ecmaVersion: 'latest',
sourceType: 'module'
}), {
ImportDeclaration(node) {
const specifier = node.source.value;
const isBareSpecifier = specifier.indexOf('.') !== 0 && specifier.indexOf('/') !== 0;
const extension = specifier.split('.').pop();
// would like to decouple .jsx from the core, ideally
// https://github.com/ProjectEvergreen/wcc/issues/122
if (!isBareSpecifier && ['js', 'jsx', 'ts'].includes(extension)) {
const dependencyModuleURL = new URL(node.source.value, moduleURL);
registerDependencies(dependencyModuleURL, definitions, nextDepth);
}
},
ExpressionStatement(node) {
if (isCustomElementDefinitionNode(node)) {
const { arguments: args } = node.expression;
const tagName = args[0].type === 'Literal'
? args[0].value // single and double quotes
: args[0].quasis[0].value.raw; // template literal
const tree = parseJsx(moduleURL);
const isEntry = nextDepth - 1 === 1;
definitions[tagName] = {
instanceName: args[1].name,
moduleURL,
source: generate(tree),
url: moduleURL,
isEntry
};
}
}
}, config);
} The Acorn parser is used twice here, once for the main parsing, and again in parseJsx to get the tree (even for non JSX files). And every file that is imported goes through this treatment... meaning Acorn parses js documents potentially 10s-100s of times depending on the number of components. Proposed method: function registerDependencies(moduleURL, definitions, depth = 0) {
const moduleContents = fs.readFileSync(moduleURL, 'utf-8');;
const shouldTransformAndParse = ['jsx', 'ts'].includes(moduleURL.pathname.split('.').pop());
// generateParsedJsx calls generate(parseJsx(moduleUrl))
const source = shouldTransformAndParse ? generateParsedJsx(moduleURL) : moduleContents;
let match;
const nextDepth = depth + 1;
// Remove comments before doing regex so we aren't matching code from within comments
const sourceNoComments = source.replace(/\/\*[\s\S]*?\*\/|\/\/.*$/gm, '')));
// Do a rudimentary check for 'import' to see if we should move forward with regex
if (source.includes('import')) {
// Then, we move forward with regex
const importRegex = /import(?:\s+[\w{},*\s]+)?\s+from\s+['"`]([^'"`\n]+)['"`]|import\s+['"`]([^'"`\n]+)['"`]/gm;
while ((match = importRegex.exec(sourceNoComments) {
const specifier = match[1] || match[2];
const isBareSpecifier = specifier.indexOf('.') !== 0 && specifier.indexOf('/') !== 0;
const extension = specifier.split('.').pop();
if (!isBareSpecifier && ['js', 'jsx', 'ts'].includes(extension)) {
const dependencyModuleURL = new URL(specifier, moduleURL);
registerDependencies(dependencyModuleURL, definitions, nextDepth);
}
}
}
// Do a rudimentary check for 'customElements.define' to see if we should move forward with regex
if (source.includes('customElements.define')) {
// Then, we move forward with regex
const customElementRegex = /customElements\.define\s*\(\s*(['"`])([^'"`]+)\1\s*,/g;
while ((match = customElementRegex.exec(sourceNoComments))) {
const tagName = match[2];
definitions[tagName] = {
instanceName: tagName,
moduleURL,
source,
url: moduleURL,
isEntry: depth === 1
};
}
}
} Here, we: Other optimizations were made in the proposal that were stripped for this comparison. Regarding regex being brittle in this context, I believe that import patterns and the customElements.define pattern might be predictable enough for regex to cover the edge cases. In this comparison, for example, we account for multiline imports and imports in comments. This regex can also be modified to handle dynamic imports as well, if necessary. Related question - if this looks like a viable path, should we support dynamic imports as well? |
Yeah good catch. Definitely as part of an effort like #116 at the very least it would be great if we could just pass the entire AST down to So at least to this point I think that could be addressed on its own, and so not necessarily as compelling to me in the case of abandoning an initial AST walk completely, per se.
Regarding However, doing this "shallow" check for something like
All that said, the more I look at it now, I guess really all the heavy lifting is really being done by parse5 anyway, right? Acorn was just used being as a way to grab little bits of "metadata" based on the presence of custom element definitions, which is really just the tag name, right? I wonder then if the only last thing to consider in the regex is to replace line breaks too? Just thinking about formatting tools like Prettier or other stylistic tools that users might use, although rare, based on some of my initial testing, where the const source = `
class Header extends HTMLElement {
connectedCallback() {
this.innerHTML = '<h1>Hello World</h1>';
}
}
customElements
.define('wcc-header', Header);
`;
let match;
if (source.includes('customElements.define')) {
const customElementRegex = /customElements\.define\s*\(\s*(['"`])([^'"`]+)\1\s*,/g;
while ((match = customElementRegex.exec(source))) {
const tagName = match[2];
console.log({ tagName });
}
} Logs nothing ➜ wcc git:(master) ✗ node no-acorn.js
➜ wcc git:(master) ✗ and similar results from Regex 101 playground? Another thong that would be interesting to have is a bench markwhere we can stress test WCC under a certain load, similar to something like js-framework-benchmark or ssr-performance-showdown which could be helpful in quantifying what kind of speed up we get here. Just a thought? |
Summary
Currently, WCC uses Acorn to parse every component, even when it isn't necessary. Regex can be used to locate imports and customElements definitions for .js files with more efficiency. Removing Acorn from wcc.js so it is only a dependency of the JSX loader means that eventually modularizing the JSX loader becomes a simpler task. Separating these concerns seems worthwhile.
Proposal:
https://github.com/ProjectEvergreen/wcc/tree/enhancement/optimization-reducing-dependencies
Currently passing all tests
Dependencies for wcc.js are reduced to:
And if the JSX loader is made to be a module in a future PR, the only dependencies for WCC core as a whole would be parse5 and fs.
Optimizations
Numerous other small improvements and a number of added tests.
The text was updated successfully, but these errors were encountered: