Optimize WCC to only use Acorn when necessary #188

briangrider · 2025-01-21T04:53:06Z

Summary

Currently, WCC uses Acorn to parse every component, even when it isn't necessary. Regex can be used to locate imports and customElements definitions for .js files with more efficiency. Removing Acorn from wcc.js so it is only a dependency of the JSX loader means that eventually modularizing the JSX loader becomes a simpler task. Separating these concerns seems worthwhile.

Proposal:

https://github.com/ProjectEvergreen/wcc/tree/enhancement/optimization-reducing-dependencies

Currently passing all tests

Dependencies for wcc.js are reduced to:

import { getParse } from './dom-shim.js';

import { generateParsedJsx } from './jsx-loader.js';
import { serialize } from 'parse5';
import fs from 'fs';

And if the JSX loader is made to be a module in a future PR, the only dependencies for WCC core as a whole would be parse5 and fs.

Optimizations

Removed all Acorn parsing from wcc.js unless using the JSX loader
Made parsing calls conditional on whether a file has a .jsx or .ts extension
Removed redundant processing when the same module is imported twice
Import regex doesn't run unless an import statement is found in a file
customElements.define regex doesn't run unless "customElements.define" is found in a file
Reduced number of readFileSync calls by reusing moduleContents for getTagName

Numerous other small improvements and a number of added tests.

thescientist13 · 2025-01-21T16:42:04Z

Can you show an example in the description of the before / after you're envisioning here?

Regex seems kind of brittle when thinking about parsing an entire JS file for things like import statements, usage of customElements.define etc. The value of something like an AST / acorn is that it is significantly more predictable and more guaranteed to get things "right", so a nice side by side could really help with seeing what the Regex version would look like compared to the Acorn version.

fwiw, it would be lovely to not have to use Acorn, since then we have to rely on it to handle anything a user could throw at it, like say a TypeScript or CoffeeScript file, without us having to care, and so that would certainly help loosen things up for WCC and make this possible for sure - #122

Just having a hard time seeing how Regex gives us the same amount of confidence as an AST though? 🤔

briangrider · 2025-01-21T19:56:08Z

Sure, these comparisons are from 0.16.0 vs https://github.com/ProjectEvergreen/wcc/tree/enhancement/optimization-reducing-dependencies

Current method for registering dependencies:

function registerDependencies(moduleURL, definitions, depth = 0) {
  const moduleContents = fs.readFileSync(moduleURL, 'utf-8');
  const result = transform(moduleContents, {
    transforms: ['typescript', 'jsx'],
    jsxRuntime: 'preserve'
  });
  const nextDepth = depth += 1;
  const customParser = getParser(moduleURL);
  const parser = customParser ? customParser.parser : acorn.Parser;
  const config = customParser ? customParser.config : {
    ...walk.base
  };

  walk.simple(parser.parse(result.code, {
    ecmaVersion: 'latest',
    sourceType: 'module'
  }), {
    ImportDeclaration(node) {
      const specifier = node.source.value;
      const isBareSpecifier = specifier.indexOf('.') !== 0 && specifier.indexOf('/') !== 0;
      const extension = specifier.split('.').pop();

      // would like to decouple .jsx from the core, ideally
      // https://github.com/ProjectEvergreen/wcc/issues/122
      if (!isBareSpecifier && ['js', 'jsx', 'ts'].includes(extension)) {
        const dependencyModuleURL = new URL(node.source.value, moduleURL);

        registerDependencies(dependencyModuleURL, definitions, nextDepth);
      }
    },
    ExpressionStatement(node) {
      if (isCustomElementDefinitionNode(node)) {
        const { arguments: args } = node.expression;
        const tagName = args[0].type === 'Literal'
          ? args[0].value // single and double quotes
          : args[0].quasis[0].value.raw; // template literal
        const tree = parseJsx(moduleURL);
        const isEntry = nextDepth - 1 === 1;

        definitions[tagName] = {
          instanceName: args[1].name,
          moduleURL,
          source: generate(tree),
          url: moduleURL,
          isEntry
        };
      }
    }
  }, config);
}

The Acorn parser is used twice here, once for the main parsing, and again in parseJsx to get the tree (even for non JSX files). And every file that is imported goes through this treatment... meaning Acorn parses js documents potentially 10s-100s of times depending on the number of components.

Proposed method:

function registerDependencies(moduleURL, definitions, depth = 0) {
  const moduleContents = fs.readFileSync(moduleURL, 'utf-8');;
  const shouldTransformAndParse = ['jsx', 'ts'].includes(moduleURL.pathname.split('.').pop());
  // generateParsedJsx calls generate(parseJsx(moduleUrl))
  const source = shouldTransformAndParse ? generateParsedJsx(moduleURL) : moduleContents;

  let match;
  const nextDepth = depth + 1;
  // Remove comments before doing regex so we aren't matching code from within comments
  const sourceNoComments = source.replace(/\/\*[\s\S]*?\*\/|\/\/.*$/gm, '')));

  // Do a rudimentary check for 'import' to see if we should move forward with regex
  if (source.includes('import')) {
    // Then, we move forward with regex
    const importRegex = /import(?:\s+[\w{},*\s]+)?\s+from\s+['"`]([^'"`\n]+)['"`]|import\s+['"`]([^'"`\n]+)['"`]/gm;

    while ((match = importRegex.exec(sourceNoComments) {
      const specifier = match[1] || match[2];
      const isBareSpecifier = specifier.indexOf('.') !== 0 && specifier.indexOf('/') !== 0;
      const extension = specifier.split('.').pop();

      if (!isBareSpecifier && ['js', 'jsx', 'ts'].includes(extension)) {
        const dependencyModuleURL = new URL(specifier, moduleURL);
        registerDependencies(dependencyModuleURL, definitions, nextDepth);
      }
    }
  }

  // Do a rudimentary check for 'customElements.define' to see if we should move forward with regex
  if (source.includes('customElements.define')) {
    // Then, we move forward with regex
    const customElementRegex = /customElements\.define\s*\(\s*(['"`])([^'"`]+)\1\s*,/g;

    while ((match = customElementRegex.exec(sourceNoComments))) {
      const tagName = match[2];
      definitions[tagName] = {
        instanceName: tagName,
        moduleURL,
        source,
        url: moduleURL,
        isEntry: depth === 1
      };
    }
  }
}

Here, we:
-[x] check to make sure the filetype is jsx or ts before using Acorn
-[x] check to make sure the import keyword is found in the moduleConents before looking for import statements
-[x] check to make sure "customElements.define" in the moduleContents before looking for definitions
-[x] use regex to grab the strings we need instead of Acorn

Other optimizations were made in the proposal that were stripped for this comparison.

Regarding regex being brittle in this context, I believe that import patterns and the customElements.define pattern might be predictable enough for regex to cover the edge cases.

In this comparison, for example, we account for multiline imports and imports in comments. This regex can also be modified to handle dynamic imports as well, if necessary.

Related question - if this looks like a viable path, should we support dynamic imports as well?

thescientist13 · 2025-02-06T03:05:28Z

The Acorn parser is used twice here, once for the main parsing, and again in parseJsx to get the tree (even for non JSX files).

Yeah good catch.

Definitely as part of an effort like #116 at the very least it would be great if we could just pass the entire AST down to parseJsx since then it would only have to do the walking / transformation, instead of the parsing all over again.

So at least to this point I think that could be addressed on its own, and so not necessarily as compelling to me in the case of abandoning an initial AST walk completely, per se.

-[x] check to make sure the import keyword is found in the moduleConents before looking for import statements
-[x] check to make sure "customElements.define" in the moduleContents before looking for definitions

Regarding import specifically, aside from the entry point files fed to renderToHTML / renderFromString, wouldn't this hit on almost every file anyway? Once you get past the first file, it would all have to have come in through an import statement anyway, right? So it seems like in the grand scheme of things, this wouldn't seem to be as contributing a factor to a performance boost unless you were only ever loading a couple JS files into WCC in the first place? Could be off in my analysis, so eager to hear what you think here.

However, doing this "shallow" check for something like customElements.define though, regardless of our position on removing Acorn parsing, I think could push the needle forward just on its own. I could easily see as many files being run through WCC that contain custom element as don't (e.g. first / third party libs and utils or whatever) and so just being able to bail early on those and avoid an entire Acorn cycle seems ideal.

I think either way, we would still have to consider some false positives slip through in this case, like if the word import is in a comment or just a text string, in which case we would parse for no reason, but since I'm already kind of afvocating for leaving the Acorn parsing in, I am just arguing against myself. 🙃

All that said, the more I look at it now, I guess really all the heavy lifting is really being done by parse5 anyway, right? Acorn was just used being as a way to grab little bits of "metadata" based on the presence of custom element definitions, which is really just the tag name, right?

I wonder then if the only last thing to consider in the regex is to replace line breaks too? Just thinking about formatting tools like Prettier or other stylistic tools that users might use, although rare, based on some of my initial testing, where the .define is on a new line, this would fail the current check, no?

const source = `
class Header extends HTMLElement {
  connectedCallback() {
    this.innerHTML = '<h1>Hello World</h1>';
  }
}

customElements
  .define('wcc-header', Header);
`;

let match;

if (source.includes('customElements.define')) {
  const customElementRegex = /customElements\.define\s*\(\s*(['"`])([^'"`]+)\1\s*,/g;

  while ((match = customElementRegex.exec(source))) {
    const tagName = match[2];
    console.log({ tagName });
  }
}

Logs nothing

➜  wcc git:(master) ✗ node no-acorn.js
➜  wcc git:(master) ✗

and similar results from Regex 101 playground?

Another thong that would be interesting to have is a bench markwhere we can stress test WCC under a certain load, similar to something like js-framework-benchmark or ssr-performance-showdown which could be helpful in quantifying what kind of speed up we get here. Just a thought?

briangrider added the enhancement Improvement to existing functionality label Jan 21, 2025

briangrider self-assigned this Jan 21, 2025

briangrider changed the title ~~Optimize WCC to only use Acorn when necessary/prep to modularize JSX loader~~ Optimize WCC to only use Acorn when necessary Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize WCC to only use Acorn when necessary #188

Optimize WCC to only use Acorn when necessary #188

briangrider commented Jan 21, 2025 •

edited

Loading

thescientist13 commented Jan 21, 2025 •

edited

Loading

briangrider commented Jan 21, 2025 •

edited

Loading

thescientist13 commented Feb 6, 2025 •

edited

Loading

Optimize WCC to only use Acorn when necessary #188

Optimize WCC to only use Acorn when necessary #188

Comments

briangrider commented Jan 21, 2025 • edited Loading

Summary

Proposal:

Optimizations

thescientist13 commented Jan 21, 2025 • edited Loading

briangrider commented Jan 21, 2025 • edited Loading

thescientist13 commented Feb 6, 2025 • edited Loading

briangrider commented Jan 21, 2025 •

edited

Loading

thescientist13 commented Jan 21, 2025 •

edited

Loading

briangrider commented Jan 21, 2025 •

edited

Loading

thescientist13 commented Feb 6, 2025 •

edited

Loading