Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add post-processor and events post-processing module #1015

Merged
merged 8 commits into from
Jul 18, 2022

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Jul 14, 2022

This is a breaking change. It does two main things:

  1. It adds a post-processing stage to Reffy. This makes it possible and easy to run modules that may need to mix data extracted by multiple crawling modules, and/or to mix data generated from other specs. The update converts logic that used to be hardcoded in the crawler such as idlparsed generation to post-processing modules.
  2. It adds an "events" post-processing module to consolidate event extracts.

More precisely, changes are:

  • Add a post-processor library that can run post-processing modules either at the spec level or at the crawl level.
  • Convert generateIDLParsed/saveIDLParsed and generateIDLNames/saveIDLNames to post-processing modules "idlparsed" and "idlnames". The former runs at the spec level. The latter runs at the crawl level.
  • Create a "csscomplete" post-processing module for post-crawl CSS completing tasks that were previously hardcoded in the crawler.
  • Create an "events" post-processing module that integrates the non-patch logic done in webref's PR Adjust scripts to also create @webref/events package webref#650
  • Add a --post option to Reffy's CLI to specify the post-processing modules to run. None are run by default. This means that "idlparsed" and "idlnames" are no longer generated by default during a crawl. This also means that CSS properties are not completed (e.g. with the IDL attributes they generate) by default either.
  • Add a --crawl option to Reffy's CLI to make it skip the crawl and rather ingest a crawl result. The option is only really useful when combined with the new --post option. It makes it possible to run post-processing modules on an existing crawl result.
  • Make the crawler skip the crawl step when the --crawl option is used.
  • Drop the "generate-idlparsed" and "generate-idlnames" CLI tools. Same result may now be achieved by running the "idlparsed" and "idlnames" post-processing modules through Reffy's CLI.
  • Expose the post-processor as an entry in index.js.

Bugs fixed along the way:

  • The isLatestLevelThatPasses function could crash due to a check being done too early.
  • The crawler was not setting the right crawled property initially.
  • The events extraction expected crawled to be set on the spec object, but it is typically set "after" the crawler is done running the modules. The window.location URL should be used instead. Tests adjusted accordingly.

Still missing (could be done later on?):

  • Post-processor tests.
  • Update README to explain the post-processor.
  • Smarter workflow logic to order the post-processing modules based on what they depend on. In practice, that's not needed for now because the only theoretical hiccup is between "idlparsed" and "idlnames" and the former runs at the spec level, so the latter always runs after the former.

tidoust added 4 commits July 13, 2022 18:57
This is a breaking change. It does two main things:
1. It adds a post-processing stage to Reffy. This makes it possible and easy to
run modules that may need to mix data extracted by multiple crawling modules,
and/or to mix data generated from other specs. The update converts logic that
used to be hardcoded in the crawler such as idlparsed generation to
post-processing modules.
2. It adds an "events" post-processing module to consolidate event extracts.

More precisely, changes are:
- Add a post-processor library that can run post-processing modules either at
the spec level or at the crawl level.
- Convert generateIDLParsed/saveIDLParsed and generateIDLNames/saveIDLNames
to post-processing modules "idlparsed" and "idlnames". The former runs at the
spec level. The latter runs at the crawl level.
- Create a "csscomplete" post-processing module for post-crawl CSS completing
tasks that were previously hardcoded in the crawler.
- Create an "events" post-processing module that integrates the non-patch logic
done in webref's PR w3c/webref#650
- Add a `--post` option to Reffy's CLI to specify the post-processing modules
to run. None are run by default. This means that "idlparsed" and "idlnames" are
no longer generated by default during a crawl. This also means that CSS
properties are not completed (e.g. with the IDL attributes they generate)
by default either.
- Add a `--crawl` option to Reffy's CLI to make it skip the crawl and rather
ingest a crawl result. The option is only really useful when combined with the
new `--post` option. It makes it possible to run post-processing modules on an
existing crawl result.
- Make the crawler skip the crawl step when the `--crawl` option is used.
- Drop the "generate-idlparsed" and "generate-idlnames" CLI tools. Same result
may now be achieved by running the "idlparsed" and "idlnames" post-processing
modules through Reffy's CLI.
- Expose the post-processor as an entry in `index.js`.

Bugs fixed along the way:
- The `isLatestLevelThatPasses` function could crash due to a check being done
too early.
- The crawler was not setting the right `crawled` property initially.
- The events extraction expected `crawled` to be set on the spec object, but it
is typically set "after" the crawler is done running the modules. The
window.location URL should be used instead. Tests adjusted accordingly.

Still missing (could be done later on?):
- Post-processor tests.
- Update README to explain the post-processor.
- Smarter workflow logic to order the post-processing modules based on what
they depend on. In practice, that's not needed for now because the only
theoretical hiccup is between "idlparsed" and "idlnames" and the former runs
at the spec level, so the latter always runs after the former.
Result that receives the post-processor in a real crawl does not yet have a
`shortname` property, so check should not have been based on it.
The function will be useful to run tests in Webref
@tidoust
Copy link
Member Author

tidoust commented Jul 16, 2022

I ran the crawler locally and confirms that the new version produces the same extracts as the previous ones.

A few additional notes:

  1. The events post-processing module generates an events.json file in the root output folder and not an index.json file under an events sub-folder. We'll have to adjust the code that will generate the @webref/events package accordingly but that should be a no-brainer.
  2. A consequence of 1. is that we could actually envision getting back to events for the name of the folder that contains the raw events extracts, instead of spec-events.
  3. Webref's curation code will need to be adjusted to the new version in any case to run the post-processing modules on the patched results.

@tidoust tidoust marked this pull request as ready for review July 16, 2022 07:48
@tidoust tidoust requested a review from dontcallmedom July 16, 2022 07:48
Copy link
Member

@dontcallmedom dontcallmedom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some reservations on some of the naming choices :)

reffy.js Outdated Show resolved Hide resolved
reffy.js Outdated Show resolved Hide resolved
src/lib/post-processor.js Outdated Show resolved Hide resolved
src/lib/util.js Outdated Show resolved Hide resolved
tidoust and others added 3 commits July 18, 2022 10:39
Co-authored-by: Dominique Hazael-Massieux <[email protected]>
Per feedback, rename `--crawl` to `--use-crawl` to make it clearer that Reffy
will use the provided parameter as input, and rename generic `getTreeInfo`
function to a more specific `getInterfaceTreeInfo` function.
@dontcallmedom
Copy link
Member

2. A consequence of 1. is that we could actually envision getting back to events for the name of the folder that contains the raw events extracts, instead of spec-events.

+1 to that idea

@tidoust tidoust merged commit 7a2afc6 into main Jul 18, 2022
@tidoust tidoust deleted the postprocessing-and-events branch July 18, 2022 12:39
tidoust added a commit that referenced this pull request Jul 18, 2022
Breaking changes:
- Add post-processor and events post-processing module (#1015)

Bug fixes:
- Couple of bug fixes in crawler (#1019)
- Fix exported functions crawlSpecs and getInterfaceTreeInfo (#1018)

Bumped dependencies:
- Bump rollup from 2.76.0 to 2.77.0 (#1017)
- Bump commander from 9.3.0 to 9.4.0 (#1016)

What "breaks" compared to v7, a.k.a "how to upgrade":
- Reffy no longer completes CSS extracts per default to add generated IDL
attributes and add properties defined in prose. Run the crawler with the
`csscomplete` post-processing module to get the same result. Note the
`csscomplete` module will look at `dfns` extracts to add properties defined in
prose. In other words, command line should include something like:
    `--module dfns --post csscomplete`
(or not mention `--module` at all as Reffy runs all crawl modules by default)
- Reffy no longer outputs parsed CSS structures to the console when CSS
extraction runs. This was not used anywhere. It would be trivial to do this in
a post-processing module if that seems needed.
- Reffy no longer produces "idlparsed" and "idlnames" extracts per default. Run
the crawler with the `idlparsed` and `idlnames` post-processing modules. The
`idlparsed` module needs `idl` extracts to run. The `idlnames` module needs
`idlparsed` extracts to run. In other words, command line should include
something like:
   `--module idl --post idlparsed --post idlnames`
(or not mention `--module` at all as Reffy runs all crawl modules by default)
- Reffy saves events extracts to an `events` folder again (instead of
`spec-events`). Events extraction and events merging should be viewed as a
beta feature for now, likely to be improved in future versions of Reffy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants