-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write CSS extraction logic. Add values #1117
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The CSS extraction logic was grounded on production rules, did not include possible values of a property, did not know how to deal with functions and types that are namespaced to a property or to another type, and did not include selectors. This update changes all of that: - Selectors are now reported under a `selectors` property at the root level. - Possible values that some definition may take are now reported under a `values` property in the definition. - Functions and types that are namespaced to some other definition are included in the list of `values` of that definition. Some breaking changes along the way: 1. Arrays are now used throughout instead of indexed objects, because that's easier to deal with in code. It is easy to create an index from an array in JavaScript if someone needs it. 2. Function names are no longer enclosed in `<` and `>` because they are not defined in specs with these characters (as opposed to types). Beware though, references to functions in value syntax do use enclosing `<` and `>` characters. 3. The property `valuespaces` at the root level was replaced with a `values` property (and an array is also used there). This `values` property lists both `function` and `type` definitions that are not namespaced to anything in particular. There are no `value` definitions at the root level because `value` definitions are always namespaced to some other definition. And yes, naming is hard. Some additional non-breaking changes: - To distinguish between `function`, `type` and `value` definitions listed in a `values` property, definitions that appear in a `values` property have a `type` property. - Only namespaced values associated with a definition are listed under its `values` property. Non-namespaced values are not. For instance, `<quote>` is not listed as a value of the `<content-list>` type, even though its value syntax references it. This avoids duplicating constructs in the extracts. - Values are only listed under the deepest definition to which they apply. For instance, `open-quote` is only listed as a value of `<quote>` but neither as a value of the `<content-list>` type that references `<quote>` nor as a value of the `content` property that references `<content-list>`. This is also to avoid duplicating constructs in the extracts.
This updates makes the code record all extraction warnings in a `warnings` properties at the root of the reported extract, so that they can be used to produce anomaly reports. Warning messages reported so far are: - `"Missing dfn"` for production rules that could not be connected back to a dfn - `"Dangling value"` when a value is for a definition that could not be found
An empty attribute made the code crash. This happens for the second definition of `<size>` in: https://drafts.csswg.org/css-images-3/#typedef-size%E2%91%A0 (FWIW, that definition is actually invalid because it re-defines a type that is already defined in a production rule at the beginning of the section. This makes the extraction throw an error for now as it can't yet deal with duplicates)
The code used to throw when bumping into an unexpected duplicate definition. It now reports the duplicate definition in the "warnings" property. Same thing for duplicate definitions that, while not unexpected, cannot be merged with the base definition.
- Keep "type" for descriptors as that happens once in a while - Handle cycles when checking whether a dfn is the ancestor of another one - Only look at "approximate" name after looking at exact one - Skip IDL blocks in HTML spec (`pre > code.idl`) to avoid finding CSS values in IDL code - Adjust "saveCss" logic in crawler to new structure
CI test failures are expected. They are triggered by the schema change! |
dontcallmedom
approved these changes
Nov 23, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍🎉
HTML and Fullscreen re-define selectors that are already defined in CSS Selectors. There re-definition are internal to the spec. This update ignores them when preparing the CSS extract. Talking about exported definitions, all CSS properties, at-rules and descriptors are currently defined with a `data-export` attribute. Most selectors and CSS values (`function`, `type`, and `value`) are exported too, but 16 of them are not. Private values are needed to make sense of other constructs, so these ones do appear in the extracts.
tidoust
added a commit
that referenced
this pull request
Nov 28, 2022
This new major version modifies and completes the CSS extraction logic. See #1117 for details. No other change was made, meaning breaking and non-breaking changes **only affect CSS extracts**. Breaking changes: - Arrays are now used throughout instead of indexed objects. - Function names are no longer enclosed in `<` and `>` because they are not defined in specs with these characters (as opposed to types). Beware though, references to functions in value syntax do use enclosing `<` and `>` characters. - The property `valuespaces` at the root level is now named `values`. An array is used there as well. The `values` property lists both `function` and `type` definitions that are not namespaced to anything in particular (it used to also contain namespaced definitions). Non-breaking changes: - Selectors are now reported under a `selectors` property at the root level. - Possible values that some definition may take are now reported under a `values` property directly within the definition. - Functions and types that are namespaced to some other definition are included in the list of `values` of that definition. - Anomalies detected in the spec are now reported under a `warnings` property at the root of the extract. Four types of anomalies are reported: 1. **Missing definition**: when a production rule was found but when the spec does not include a corresponding `<dfn>` (or when that `<dfn>` does not have a `data-dfn-type` attribute that identifies a CSS construct) 2. **Duplicate definition**: when the spec defines the same term twice. 3. **Unmergeable definition**: when the spec defines the same property twice and both definitions cannot be merged. 4. **Dangling value**: when the spec defines a CSS "value" definition (`value`, `function` or `type`) for something and that something cannot be found in the spec - To distinguish between `function`, `type` and `value` definitions listed in a `values` property, definitions that appear in a `values` property have a `type` property. Additional notes: - Only namespaced values associated with a definition are listed under its `values` property. Non-namespaced values are not. For instance, `<quote>` is not listed as a value of the `<content-list>` type, even though its value syntax references it. This is to avoid duplicating constructs in the extracts. - Values are only listed under the deepest definition to which they apply. For instance, `open-quote` is only listed as a value of `<quote>` but neither as a value of the `<content-list>` type that references `<quote>` nor as a value of the `content` property that references `<content-list>`. This is also to avoid duplicating constructs in the extracts. - Some of the extracts contain things that may look weird at first, but that is "by design". For instance, [CSS Will change](https://drafts.csswg.org/css-will-change-1/) defines a [`<custom-ident>`](https://drafts.csswg.org/css-will-change-1/#valdef-will-change-custom-ident) `value` construct whose actual value is the `<custom-ident>` `type` construct defined in CSS Values. Having both a namespaced `value` and a non-namespaced `<type>` is somewhat common in CSS specs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The CSS extraction logic was grounded on production rules, did not include possible values of a property, did not know how to deal with functions and types that are namespaced to a property or to another type, and did not include selectors. Also, likely anomalies in the spec were reported to the console but not in the extract itself.
This update changes all of that:
selectors
property at the root level.values
property directly within the definition.values
of that definition.warnings
property at the root of the extract. Four types of anomalies are reported:<dfn>
(or when that<dfn>
does not have adata-dfn-type
attribute that identifies a CSS construct)value
,function
ortype
) for something and that something cannot be found in the specBreaking changes along the way:
<
and>
because they are not defined in specs with these characters (as opposed to types). Beware though, references to functions in value syntax do use enclosing<
and>
characters.valuespaces
at the root level was replaced with avalues
property (and an array is also used there). Thisvalues
property lists bothfunction
andtype
definitions that are not namespaced to anything in particular. There are novalue
definitions at the root level becausevalue
definitions are always namespaced to some other definition. And yes, naming is hard.Some additional non-breaking changes:
function
,type
andvalue
definitions listed in avalues
property, definitions that appear in avalues
property have atype
property.values
property. Non-namespaced values are not. For instance,<quote>
is not listed as a value of the<content-list>
type, even though its value syntax references it. This avoids duplicating constructs in the extracts.open-quote
is only listed as a value of<quote>
but neither as a value of the<content-list>
type that references<quote>
nor as a value of thecontent
property that references<content-list>
. This is also to avoid duplicating constructs in the extracts.<custom-ident>
value
construct whose actual value is the<custom-ident>
type
construct defined in CSS Values. Having both a namespacedvalue
and a non-namespaced<type>
is somewhat common in CSS specs.Tests and schema were adjusted to the new structure and new tests were added to test values, selectors and warnings.
To validate the code, I wrote a small conversion script that takes the new structure, generates the "legacy" one out of it, and compares the result with a crawl result. See below for the code. I reviewed the differences. As far as I can tell, they are all net improvements:
css-nesting-1
,css-pseudo-4
,css-scoping-1
).Conversion script
Example of a new extract: CSS View Transitions Module Level 1
Note Webref and tools that process
@webref/css
contents will need to be adjusted to process the new structure (for Webref, the de-deduplication logic needs to be re-visited and the description of the package should be updated).