Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write CSS extraction logic. Add values #1117

Merged
merged 12 commits into from
Nov 28, 2022
Merged

Re-write CSS extraction logic. Add values #1117

merged 12 commits into from
Nov 28, 2022

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Nov 23, 2022

The CSS extraction logic was grounded on production rules, did not include possible values of a property, did not know how to deal with functions and types that are namespaced to a property or to another type, and did not include selectors. Also, likely anomalies in the spec were reported to the console but not in the extract itself.

This update changes all of that:

  • Selectors are now reported under a selectors property at the root level.
  • Possible values that some definition may take are now reported under a values property directly within the definition.
  • Functions and types that are namespaced to some other definition are included in the list of values of that definition.
  • Anomalies detected in the spec are now reported under a warnings property at the root of the extract. Four types of anomalies are reported:
    1. Missing definition: when a production rule was found but when the spec does not include a corresponding <dfn> (or when that <dfn> does not have a data-dfn-type attribute that identifies a CSS construct)
    2. Duplicate definition: when the spec defines the same term twice.
    3. Unmergeable definition: when the spec defines the same property twice and both definitions cannot be merged.
    4. Dangling value: when the spec defines a CSS "value" definition (value, function or type) for something and that something cannot be found in the spec

Breaking changes along the way:

  1. Arrays are now used throughout instead of indexed objects because that's easier to deal with in code. It is easy to create an index from an array in JavaScript if someone needs it.
  2. Function names are no longer enclosed in < and > because they are not defined in specs with these characters (as opposed to types). Beware though, references to functions in value syntax do use enclosing < and > characters.
  3. The property valuespaces at the root level was replaced with a values property (and an array is also used there). This values property lists both function and type definitions that are not namespaced to anything in particular. There are no value definitions at the root level because value definitions are always namespaced to some other definition. And yes, naming is hard.

Some additional non-breaking changes:

  • To distinguish between function, type and value definitions listed in a values property, definitions that appear in a values property have a type property.
  • Only namespaced values associated with a definition are listed under its values property. Non-namespaced values are not. For instance, <quote> is not listed as a value of the <content-list> type, even though its value syntax references it. This avoids duplicating constructs in the extracts.
  • Values are only listed under the deepest definition to which they apply. For instance, open-quote is only listed as a value of <quote> but neither as a value of the <content-list> type that references <quote> nor as a value of the content property that references <content-list>. This is also to avoid duplicating constructs in the extracts.
  • Some of the extracts contain things that may look weird at first, but that's because the spec itself is weird. For instance, CSS Will change defines a <custom-ident> value construct whose actual value is the <custom-ident> type construct defined in CSS Values. Having both a namespaced value and a non-namespaced <type> is somewhat common in CSS specs.

Tests and schema were adjusted to the new structure and new tests were added to test values, selectors and warnings.

To validate the code, I wrote a small conversion script that takes the new structure, generates the "legacy" one out of it, and compares the result with a crawl result. See below for the code. I reviewed the differences. As far as I can tell, they are all net improvements:

  • New extract files get generated now that the selectors are also extracted (e.g. css-nesting-1, css-pseudo-4, css-scoping-1).
  • Namespaced values that used to be reported as global "valuespaces" are now properly reported under the property, at-rule, selector, descriptor, type or function for which they are valid.
  • Some type and function declarations for which a proper definition is missing are now reported as warnings.
Conversion script
const path = require('path');
const fs = require('fs').promises;
const assert = require('node:assert').strict;

// To be adjusted to local context!
const sourceFolder = path.join(__dirname, 'reports', 'ed', 'css');
const legacyFolder = path.join(__dirname, '..', 'webref', 'ed', 'css');

function convertDescriptor(desc) {
  const res = Object.assign({}, desc);
  if (res.values) {
    delete res.values;
  }
  return res;
}

function convertProperty(prop) {
  const res = Object.assign({}, prop);
  if (res.values) {
    delete res.values;
  }
  return res;
}

function convertAtrule(rule) {
  const res = Object.assign({}, rule);
  delete res.name;
  if (res.values) {
    delete res.values;
  }
  if (res.descriptors) {
    res.descriptors = res.descriptors.map(convertDescriptor);
  }
  if (res.prose && res.value) {
    delete res.prose;
  }
  return res;
}

function convertValue(value) {
  const res = Object.assign({}, value);
  delete res.name;
  delete res.type;
  if (res.values) {
    delete res.values;
  }
  return res;
}

function convert(extract) {
  const res = {
    spec: extract.spec,
    properties: {},
    atrules: {},
    valuespaces: {}
  };

  for (const prop of extract.properties) {
    res.properties[prop.name] = convertProperty(prop);
  }

  for (const rule of extract.atrules) {
    res.atrules[rule.name] = convertAtrule(rule);
  }

  for (const value of extract.values) {
    const name = value.name.startsWith('<') ? value.name : `<${value.name}>`;
    res.valuespaces[name] = convertValue(value);
  }

  return res;
}

async function readFiles(folder) {
  const res = {};
  const files = await fs.readdir(folder);
  for (const f of files) {
    if (f.endsWith('.json') && f !== 'package.json') {
      const text = await fs.readFile(path.join(folder, f), 'utf8');
      res[path.basename(f, '.json')] = JSON.parse(text);
    }
  }
  return res;
}

async function main() {
  const source = await readFiles(sourceFolder);
  const converted = {};
  for (const [filename, extract] of Object.entries(source)) {
    converted[filename] = convert(extract);
  }

  const legacy = await readFiles(legacyFolder);

  for (const shortname of Object.keys(converted)) {
    try {
      assert(legacy[shortname], `"${shortname}" not in legacy result`);
    }
    catch (err) {
      console.log(converted[shortname].spec.url, err.message);
    }
  }

  for (const shortname of Object.keys(legacy)) {
    try {
      assert(converted[shortname], `"${shortname}" not in new result`);
    }
    catch (err) {
      console.log(legacy[shortname].spec.url, err.message);
    }
  }

  for (const [shortname, extract] of Object.entries(converted)) {
    if (legacy[shortname]) {
      try {
        assert.deepEqual(extract, legacy[shortname]);
      }
      catch (err) {
        console.log(`\n${shortname}\n${extract.spec.url}\n`, err.message);
      }
    }
  }
}

main();
Example of a new extract: CSS View Transitions Module Level 1
{
  "spec": {
    "title": "CSS View Transitions Module Level 1",
    "url": "https://drafts.csswg.org/css-view-transitions-1/"
  },
  "properties": [
    {
      "name": "view-transition-name",
      "value": "none | <custom-ident>",
      "initial": "none",
      "appliesTo": "all elements",
      "inherited": "no",
      "percentages": "n/a",
      "computedValue": "as specified",
      "canonicalOrder": "per grammar",
      "animationType": "discrete",
      "values": [
        {
          "name": "none",
          "prose": "The element will not participate in a view transition.",
          "type": "value",
          "value": "none"
        },
        {
          "name": "<custom-ident>",
          "prose": "The element can participate in a view transition, as either an old or new element, with a view transition name equal to the <custom-ident>'s value.",
          "type": "value",
          "value": "<custom-ident>"
        }
      ],
      "styleDeclaration": [
        "view-transition-name",
        "viewTransitionName"
      ]
    }
  ],
  "atrules": [],
  "selectors": [
    {
      "name": "::view-transition",
      "prose": "This pseudo-element is the grouping container of all the other view-transition pseudo-elements. Its originating element is the document’s document element. The following user-agent origin styles apply to this element: html::view-transition { position: fixed; inset: 0; }"
    },
    {
      "name": "::view-transition-group( <pt-name-selector> )",
      "prose": "One of these pseudo-elements exists for each view-transition-name in a view transition, and holds the rest of the pseudo-elements corresponding to this view-transition-name. Its originating element is the ::view-transition pseudo-element. The following user-agent origin styles apply to this element: html::view-transition-group(*) { position: absolute; top: 0; left: 0; animation-duration: 0.25s; animation-fill-mode: both; } The selector for this and subsequently defined pseudo-elements is likely to change to indicate position in the pseudo-tree hierarchy."
    },
    {
      "name": "::view-transition-image-pair( <pt-name-selector> )",
      "prose": "One of these pseudo-elements exists for each view-transition-name being in a view transition, and holds the images of the old and new elements. Its originating element is the ::view-transition-group() pseudo-element with the same transition-name. The following user-agent origin styles apply to this element: html::view-transition-image-pair(*) { position: absolute; inset: 0; animation-duration: inherit; animation-fill-mode: inherit; } In addition to above, styles in the user-agent origin add isolation: isolate to this pseudo-element if it has both ::view-transition-new() and ::view-transition-old() as children. Isolation is only necessary to get the right cross-fade between new and old image pixels. Would it be simpler to always add it and try to optimize in the implementation?"
    },
    {
      "name": "::view-transition-old( <pt-name-selector> )",
      "prose": "One of these pseudo-elements exists for each element in the old DOM being animated by the view transition, and is a replaced element displaying the old element’s snapshot image. It has natural dimensions equal to the snapshot’s size. Its originating element is the ::view-transition-image-pair() pseudo-element with the same transition-name. The following user-agent origin styles apply to this element: html::view-transition-old(*) { position: absolute; inset-block-start: 0; inline-size: 100%; block-size: auto; animation-duration: inherit; animation-fill-mode: inherit; } In addition to above, styles in the user-agent origin add mix-blend-mode: plus-lighter to this pseudo element if the ancestor ::view-transition-image-pair() has both ::view-transition-new() and ::view-transition-old() as descendants. Additional user-agent origin styles added to animate these pseudo-elements are detailed in Animate a view transition."
    },
    {
      "name": "::view-transition-new( <pt-name-selector> )",
      "prose": "Identical to ::view-transition-old(), except it deals with the new element instead."
    }
  ],
  "values": [
    {
      "name": "<pt-name-selector>",
      "type": "type",
      "value": "'*' | <custom-ident>"
    }
  ]
}

Note Webref and tools that process @webref/css contents will need to be adjusted to process the new structure (for Webref, the de-deduplication logic needs to be re-visited and the description of the package should be updated).

The CSS extraction logic was grounded on production rules, did not include
possible values of a property, did not know how to deal with functions and
types that are namespaced to a property or to another type, and did not
include selectors.

This update changes all of that:
- Selectors are now reported under a `selectors` property at the root level.
- Possible values that some definition may take are now reported under a
`values` property in the definition.
- Functions and types that are namespaced to some other definition are
included in the list of `values` of that definition.

Some breaking changes along the way:
1. Arrays are now used throughout instead of indexed objects, because that's
easier to deal with in code. It is easy to create an index from an array in
JavaScript if someone needs it.
2. Function names are no longer enclosed in `<` and `>` because they are not
defined in specs with these characters (as opposed to types). Beware though,
references to functions in value syntax do use enclosing `<` and `>`
characters.
3. The property `valuespaces` at the root level was replaced with a `values`
property (and an array is also used there). This `values` property lists both
`function` and `type` definitions that are not namespaced to anything in
particular. There are no `value` definitions at the root level because `value`
definitions are always namespaced to some other definition. And yes, naming is
hard.

Some additional non-breaking changes:
- To distinguish between `function`, `type` and `value` definitions listed in a
`values` property, definitions that appear in a `values` property have a `type`
property.
- Only namespaced values associated with a definition are listed under its
`values` property. Non-namespaced values are not. For instance, `<quote>` is
not listed as a value of the `<content-list>` type, even though its value
syntax references it. This avoids duplicating constructs in the extracts.
- Values are only listed under the deepest definition to which they apply. For
instance, `open-quote` is only listed as a value of `<quote>` but neither as a
value of the `<content-list>` type that references `<quote>` nor as a value of
the `content` property that references `<content-list>`. This is also to avoid
duplicating constructs in the extracts.
This updates makes the code record all extraction warnings in a `warnings`
properties at the root of the reported extract, so that they can be used to
produce anomaly reports.

Warning messages reported so far are:
- `"Missing dfn"` for production rules that could not be connected back to a dfn
- `"Dangling value"` when a value is for a definition that could not be found
An empty attribute made the code crash.

This happens for the second definition of `<size>` in:
https://drafts.csswg.org/css-images-3/#typedef-size%E2%91%A0

(FWIW, that definition is actually invalid because it re-defines a type that is
already defined in a production rule at the beginning of the section. This makes
the extraction throw an error for now as it can't yet deal with duplicates)
The code used to throw when bumping into an unexpected duplicate definition. It
now reports the duplicate definition in the "warnings" property. Same thing for
duplicate definitions that, while not unexpected, cannot be merged with the base
definition.
- Keep "type" for descriptors as that happens once in a while
- Handle cycles when checking whether a dfn is the ancestor of another one
- Only look at "approximate" name after looking at exact one
- Skip IDL blocks in HTML spec (`pre > code.idl`) to avoid finding CSS values
in IDL code
- Adjust "saveCss" logic in crawler to new structure
@tidoust
Copy link
Member Author

tidoust commented Nov 23, 2022

CI test failures are expected. They are triggered by the schema change!

Copy link
Member

@dontcallmedom dontcallmedom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍🎉

tests/extract-css.js Show resolved Hide resolved
@dontcallmedom dontcallmedom linked an issue Nov 24, 2022 that may be closed by this pull request
HTML and Fullscreen re-define selectors that are already defined in CSS
Selectors. There re-definition are internal to the spec. This update ignores
them when preparing the CSS extract.

Talking about exported definitions, all CSS properties, at-rules and descriptors
are currently defined with a `data-export` attribute. Most selectors and CSS
values (`function`, `type`, and `value`) are exported too, but 16 of them are
not. Private values are needed to make sense of other constructs, so these ones
do appear in the extracts.
@dontcallmedom dontcallmedom reopened this Nov 25, 2022
@tidoust tidoust merged commit db098cf into main Nov 28, 2022
@tidoust tidoust deleted the css-values branch November 28, 2022 08:30
tidoust added a commit that referenced this pull request Nov 28, 2022
This new major version modifies and completes the CSS extraction logic.
See #1117 for details.

No other change was made, meaning breaking and non-breaking changes **only
affect CSS extracts**.

Breaking changes:
- Arrays are now used throughout instead of indexed objects.
- Function names are no longer enclosed in `<` and `>` because they are not
defined in specs with these characters (as opposed to types). Beware though,
references to functions in value syntax do use enclosing `<` and `>` characters.
- The property `valuespaces` at the root level is now named `values`. An array
is used there as well. The `values` property lists both `function` and `type`
definitions that are not namespaced to anything in particular (it used to also
contain namespaced definitions).

Non-breaking changes:
- Selectors are now reported under a `selectors` property at the root level.
- Possible values that some definition may take are now reported under a
`values` property directly within the definition.
- Functions and types that are namespaced to some other definition are included
in the list of `values` of that definition.
- Anomalies detected in the spec are now reported under a `warnings` property at
the root of the extract. Four types of anomalies are reported:
  1. **Missing definition**: when a production rule was found but when the spec
  does not include a corresponding `<dfn>` (or when that `<dfn>` does not have a
  `data-dfn-type` attribute that identifies a CSS construct)
  2. **Duplicate definition**: when the spec defines the same term twice.
  3. **Unmergeable definition**: when the spec defines the same property twice
  and both definitions cannot be merged.
  4. **Dangling value**: when the spec defines a CSS "value" definition
  (`value`, `function` or `type`) for something and that something cannot be
  found in the spec
- To distinguish between `function`, `type` and `value` definitions listed in a
`values` property, definitions that appear in a `values` property have a `type`
property.

Additional notes:
- Only namespaced values associated with a definition are listed under its
`values` property. Non-namespaced values are not. For instance, `<quote>` is not
listed as a value of the `<content-list>` type, even though its value syntax
references it. This is to avoid duplicating constructs in the extracts.
- Values are only listed under the deepest definition to which they apply. For
instance, `open-quote` is only listed as a value of `<quote>` but neither as a
value of the `<content-list>` type that references `<quote>` nor as a value of
the `content` property that references `<content-list>`. This is also to avoid
duplicating constructs in the extracts.
- Some of the extracts contain things that may look weird at first, but that is
"by design". For instance, [CSS Will
change](https://drafts.csswg.org/css-will-change-1/) defines a
[`<custom-ident>`](https://drafts.csswg.org/css-will-change-1/#valdef-will-change-custom-ident)
`value` construct whose actual value is the `<custom-ident>` `type` construct
defined in CSS Values. Having both a namespaced `value` and a non-namespaced
`<type>` is somewhat common in CSS specs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Take data-dfn-for into account for CSS value space extraction
2 participants