-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add post-processor and events post-processing module #1015
Conversation
This is a breaking change. It does two main things: 1. It adds a post-processing stage to Reffy. This makes it possible and easy to run modules that may need to mix data extracted by multiple crawling modules, and/or to mix data generated from other specs. The update converts logic that used to be hardcoded in the crawler such as idlparsed generation to post-processing modules. 2. It adds an "events" post-processing module to consolidate event extracts. More precisely, changes are: - Add a post-processor library that can run post-processing modules either at the spec level or at the crawl level. - Convert generateIDLParsed/saveIDLParsed and generateIDLNames/saveIDLNames to post-processing modules "idlparsed" and "idlnames". The former runs at the spec level. The latter runs at the crawl level. - Create a "csscomplete" post-processing module for post-crawl CSS completing tasks that were previously hardcoded in the crawler. - Create an "events" post-processing module that integrates the non-patch logic done in webref's PR w3c/webref#650 - Add a `--post` option to Reffy's CLI to specify the post-processing modules to run. None are run by default. This means that "idlparsed" and "idlnames" are no longer generated by default during a crawl. This also means that CSS properties are not completed (e.g. with the IDL attributes they generate) by default either. - Add a `--crawl` option to Reffy's CLI to make it skip the crawl and rather ingest a crawl result. The option is only really useful when combined with the new `--post` option. It makes it possible to run post-processing modules on an existing crawl result. - Make the crawler skip the crawl step when the `--crawl` option is used. - Drop the "generate-idlparsed" and "generate-idlnames" CLI tools. Same result may now be achieved by running the "idlparsed" and "idlnames" post-processing modules through Reffy's CLI. - Expose the post-processor as an entry in `index.js`. Bugs fixed along the way: - The `isLatestLevelThatPasses` function could crash due to a check being done too early. - The crawler was not setting the right `crawled` property initially. - The events extraction expected `crawled` to be set on the spec object, but it is typically set "after" the crawler is done running the modules. The window.location URL should be used instead. Tests adjusted accordingly. Still missing (could be done later on?): - Post-processor tests. - Update README to explain the post-processor. - Smarter workflow logic to order the post-processing modules based on what they depend on. In practice, that's not needed for now because the only theoretical hiccup is between "idlparsed" and "idlnames" and the former runs at the spec level, so the latter always runs after the former.
Result that receives the post-processor in a real crawl does not yet have a `shortname` property, so check should not have been based on it.
The function will be useful to run tests in Webref
I ran the crawler locally and confirms that the new version produces the same extracts as the previous ones. A few additional notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some reservations on some of the naming choices :)
Co-authored-by: Dominique Hazael-Massieux <[email protected]>
Per feedback, rename `--crawl` to `--use-crawl` to make it clearer that Reffy will use the provided parameter as input, and rename generic `getTreeInfo` function to a more specific `getInterfaceTreeInfo` function.
+1 to that idea |
Breaking changes: - Add post-processor and events post-processing module (#1015) Bug fixes: - Couple of bug fixes in crawler (#1019) - Fix exported functions crawlSpecs and getInterfaceTreeInfo (#1018) Bumped dependencies: - Bump rollup from 2.76.0 to 2.77.0 (#1017) - Bump commander from 9.3.0 to 9.4.0 (#1016) What "breaks" compared to v7, a.k.a "how to upgrade": - Reffy no longer completes CSS extracts per default to add generated IDL attributes and add properties defined in prose. Run the crawler with the `csscomplete` post-processing module to get the same result. Note the `csscomplete` module will look at `dfns` extracts to add properties defined in prose. In other words, command line should include something like: `--module dfns --post csscomplete` (or not mention `--module` at all as Reffy runs all crawl modules by default) - Reffy no longer outputs parsed CSS structures to the console when CSS extraction runs. This was not used anywhere. It would be trivial to do this in a post-processing module if that seems needed. - Reffy no longer produces "idlparsed" and "idlnames" extracts per default. Run the crawler with the `idlparsed` and `idlnames` post-processing modules. The `idlparsed` module needs `idl` extracts to run. The `idlnames` module needs `idlparsed` extracts to run. In other words, command line should include something like: `--module idl --post idlparsed --post idlnames` (or not mention `--module` at all as Reffy runs all crawl modules by default) - Reffy saves events extracts to an `events` folder again (instead of `spec-events`). Events extraction and events merging should be viewed as a beta feature for now, likely to be improved in future versions of Reffy.
This is a breaking change. It does two main things:
More precisely, changes are:
--post
option to Reffy's CLI to specify the post-processing modules to run. None are run by default. This means that "idlparsed" and "idlnames" are no longer generated by default during a crawl. This also means that CSS properties are not completed (e.g. with the IDL attributes they generate) by default either.--crawl
option to Reffy's CLI to make it skip the crawl and rather ingest a crawl result. The option is only really useful when combined with the new--post
option. It makes it possible to run post-processing modules on an existing crawl result.--crawl
option is used.index.js
.Bugs fixed along the way:
isLatestLevelThatPasses
function could crash due to a check being done too early.crawled
property initially.crawled
to be set on the spec object, but it is typically set "after" the crawler is done running the modules. The window.location URL should be used instead. Tests adjusted accordingly.Still missing (could be done later on?):