diff --git a/source b/source index d93deeda001..38926c6a294 100644 --- a/source +++ b/source @@ -2097,31 +2097,6 @@ a.setAttribute('href', 'https://example.com/'); // change the content attribute this specification needs to treat as not being ASCII-compatible encodings. -

The term code unit is used as defined in the Web IDL specification: a 16 bit - unsigned integer, the smallest atomic component of a DOMString. (This is a narrower definition than the one used in - Unicode, and is not the same as a code point.)

- -

The term Unicode code point means a Unicode scalar value where possible, and - an isolated surrogate code point when not. When a conformance requirement is defined in terms of - characters or Unicode code points, a pair of code units consisting - of a high surrogate followed by a low surrogate must be treated as the single code point - represented by the surrogate pair, but isolated surrogates must each be treated as the single code - point with the value of the surrogate.

- -

In this specification, the term character, when not qualified as Unicode - character, is synonymous with the term Unicode code point.

- -

The term Unicode character is used to mean a Unicode scalar value (i.e. any - Unicode code point that is not a surrogate code point).

- -

The code-unit length of a string is the number of code - units in that string.

- -

This complexity results from the historical decision to define the DOM API in - terms of 16 bit (UTF-16) code units, rather than in terms of Unicode characters.

-
@@ -2390,6 +2365,12 @@ a.setAttribute('href', 'https://example.com/'); // change the content attribute

The following terms are defined in the WHATWG Infra standard:

-

An image should not be used if Unicode characters would serve an identical purpose. Only when - the text cannot be directly represented using Unicode, e.g. because of decorations or because the - character is not in the Unicode character set (as in the case of gaiji), would an image be - appropriate.

+

An image should not be used if characters would serve an identical purpose. Only when the text + cannot be directly represented using text, e.g., because of decorations or because there is no + appropriate character (as in the case of gaiji), would an image be appropriate.

If an author is tempted to use an image because their default system font does not support a given character, then Web Fonts are a better solution than images.

@@ -47710,7 +47690,7 @@ ldh-str = < as defined in form control minlength attribute.

If the input element has a maximum allowed value length, then the - code-unit length of the value of the element's JavaScript string length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length.

@@ -50483,7 +50463,7 @@ interface HTMLTextAreaElement : HTMLElement { data-x="attr-fe-maxlength">form control maxlength attribute.

If the textarea element has a maximum allowed value length, then the - element's children must be such that the code-unit length of the value of the + element's children must be such that the JavaScript string length of the value of the element's textContent IDL attribute with the textarea line break normalization transformation applied is equal to or less than the element's maximum allowed value length.

@@ -50625,8 +50605,8 @@ interface HTMLTextAreaElement : HTMLElement {

The textLength IDL attribute must - return the code-unit length of the element's API - value.

+ return the JavaScript string length of the element's API value.

The willValidate, validity, and HTMLLegendElement : HTMLElement {

A form control maxlength attribute, controlled by the dirty value flag, declares a limit on the number of characters a user can input. The "number of characters" is - measured using code-unit length and, in the case of textarea elements, - with all line breaks normalized to a single character (as opposed to CRLF pairs).

+ measured using JavaScript string length and, in the case of textarea + elements, with all newlines normalized to a single character (as opposed to CRLF pairs).

If an element has its form control maxlength attribute specified, the attribute's value must be a valid @@ -51977,12 +51957,12 @@ interface HTMLLegendElement : HTMLElement {

Constraint validation: If an element has a maximum allowed value length, its dirty value flag is true, its value was last changed by a user edit (as opposed to a change - made by a script), and the code-unit length of the element's JavaScript string length of the element's API value is greater than the element's maximum allowed value length, then the element is suffering from being too long.

User agents may prevent the user from causing the element's API value to be set to a value whose code-unit + data-x="concept-fe-api-value">API value to be set to a value whose JavaScript string length is greater than the element's maximum allowed value length.

In the case of textarea elements, the HTMLLegendElement : HTMLElement {

A form control minlength attribute, controlled by the dirty value flag, declares a lower bound on the number of characters a user can input. The "number of characters" is - measured using code-unit length and, in the case of textarea elements, - with all line breaks normalized to a single character (as opposed to CRLF pairs).

+ measured using JavaScript string length and, in the case of textarea + elements, with all newlines normalized to a single character (as opposed to CRLF pairs).

The minlength attribute does not imply the required attribute. If the form control has no HTMLLegendElement : HTMLElement { length, its dirty value flag is true, its value was last changed by a user edit (as opposed to a change made by a script), its value is not the empty string, and - the code-unit length of the element's API + the JavaScript string length of the element's API value is less than the element's minimum allowed value length, then the element is suffering from being too short.

@@ -55898,9 +55878,9 @@ fur
  • For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a U+0023 NUMBER SIGN character (#), one or more ASCII digits - representing the Unicode code point of the character in base ten, and finally a U+003B - SEMICOLON character (;).

  • + representing the code point of the character in base ten, and finally a U+003B (;).

    + @@ -58865,11 +58845,11 @@ space = %x0020 ; U+0020 SPACE star = %x002A ; U+002A ASTERISK (*) slash = %x002F ; U+002F SOLIDUS (/) not-newline = %x0000-0009 / %x000B-10FFFF - ; a Unicode character other than U+000A LINE FEED (LF) + ; a scalar value other than U+000A LINE FEED (LF) not-star = %x0000-0029 / %x002B-10FFFF - ; a Unicode character other than U+002A ASTERISK (*) + ; a scalar value other than U+002A ASTERISK (*) not-slash = %x0000-002E / %x0030-10FFFF - ; a Unicode character other than U+002F SOLIDUS (/) + ; a scalar value other than U+002F SOLIDUS (/)

    This corresponds to putting the contents of the element in JavaScript comments.

    @@ -71088,14 +71068,15 @@ Demos:
  • -

    If and while line is longer than maximum length - Unicode code points long, run the following substeps:

    +

    While line's length is greater than + maximum length:

      -
    1. Append the first maximum length Unicode code points of line to output.

    2. +
    3. Append the first maximum length code points of line to + output.

    4. -
    5. Remove the first maximum length Unicode code points from line.

    6. +
    7. Remove the first maximum length code points from line.

    8. Append a U+000D CARRIAGE RETURN character (CR) to output.

    9. @@ -72007,14 +71988,15 @@ END:VCARD
    10. -

      If and while line is longer than maximum length - Unicode code points long, run the following substeps:

      +

      While line's length is greater than + maximum length:

        -
      1. Append the first maximum length Unicode code points of line to output.

      2. +
      3. Append the first maximum length code points of line to + output.

      4. -
      5. Remove the first maximum length Unicode code points from line.

      6. +
      7. Remove the first maximum length code points from line.

      8. Append a U+000D CARRIAGE RETURN character (CR) to output.

      9. @@ -74121,8 +74103,7 @@ addShortcutKeyLabel(document.getElementById('c')); element.

        If specified, the value must be an ordered set of unique space-separated tokens - that are case-sensitive, each of which must be exactly one Unicode code point in - length.

        + that are case-sensitive, each of which must be exactly one code point in length.

        @@ -74212,8 +74193,8 @@ addShortcutKeyLabel(document.getElementById('c'));
          -
        1. If the value is not a string exactly one Unicode code point in length, then skip the - remainder of these steps for this value.

        2. +
        3. If the value is not a string exactly one code point in length, then skip the remainder + of these steps for this value.

        4. If the value does not correspond to a key on the system's keyboard, then skip the remainder of these steps for this value.

        5. @@ -94019,9 +94000,9 @@ space = %x0020 ; U+0020 SPACE colon = %x003A ; U+003A COLON (:) bom = %xFEFF ; U+FEFF BYTE ORDER MARK name-char = %x0000-0009 / %x000B-000C / %x000E-0039 / %x003B-10FFFF - ; a Unicode character other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:) + ; a scalar value other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:) any-char = %x0000-0009 / %x000B-000C / %x000E-10FFFF - ; a Unicode character other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) + ; a scalar value other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)

          Event streams in this format must always be encoded as UTF-8.

          @@ -99665,9 +99646,9 @@ dictionary StorageEventInit : EventInit {
          Decimal numeric character reference
          The ampersand must be followed by a U+0023 NUMBER SIGN character (#), followed by one or more - ASCII digits, representing a base-ten integer that corresponds to a Unicode code - point that is allowed according to the definition below. The digits must then be followed by a - U+003B SEMICOLON character (;).
          + ASCII digits, representing a base-ten integer that corresponds to a code point that + is allowed according to the definition below. The digits must then be followed by a U+003B + SEMICOLON character (;).
          Hexadecimal numeric character reference
          @@ -99675,15 +99656,14 @@ dictionary StorageEventInit : EventInit {
          The ampersand must be followed by a U+0023 NUMBER SIGN character (#), which must be followed by either a U+0078 LATIN SMALL LETTER X character (x) or a U+0058 LATIN CAPITAL LETTER X character (X), which must then be followed by one or more ASCII hex digits, - representing a hexadecimal integer that corresponds to a Unicode code point that is allowed - according to the definition below. The digits must then be followed by a U+003B SEMICOLON - character (;).
          + representing a hexadecimal integer that corresponds to a code point that is allowed according to + the definition below. The digits must then be followed by a U+003B SEMICOLON character (;). -

          The numeric character reference forms described above are allowed to reference any Unicode code - point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), - surrogates (U+D800–U+DFFF), and control characters other than ASCII +

          The numeric character reference forms described above are allowed to reference any code point + other than U+0000, U+000D, permanently undefined characters (noncharacters), surrogates, and control characters other than ASCII whitespace.

          An ambiguous ampersand is a U+0026 AMPERSAND @@ -99819,10 +99799,9 @@ dictionary StorageEventInit : EventInit {

          -

          The input to the HTML parsing process consists of a stream of Unicode code points, which is passed through a tokenization stage - followed by a tree construction stage. The output is a Document - object.

          +

          The input to the HTML parsing process consists of a stream of code + points, which is passed through a tokenization stage followed by a tree + construction stage. The output is a Document object.

          Implementations that do not support scripting do not have to actually create a DOM Document object, but the DOM tree in such cases is @@ -99861,10 +99840,10 @@ dictionary StorageEventInit : EventInit {

          The input byte stream

          -

          The stream of Unicode code points that comprises the input to the tokenization stage will be - initially seen by the user agent as a stream of bytes (typically coming over the network or from - the local file system). The bytes encode the actual characters according to a particular - character encoding, which the user agent uses to decode the bytes into characters.

          +

          The stream of code points that comprises the input to the tokenization stage will be initially + seen by the user agent as a stream of bytes (typically coming over the network or from the local + file system). The bytes encode the actual characters according to a particular character + encoding, which the user agent uses to decode the bytes into characters.

          For XML documents, the algorithm user agents are required to use to determine the character encoding is given by the XML specification. This section does not apply to XML @@ -100493,7 +100472,8 @@ dictionary StorageEventInit : EventInit {

          -
          A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')
          +
          A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (`<!--`)

          Advance the position pointer so that it points at the first 0x3E byte @@ -100584,14 +100564,14 @@ dictionary StorageEventInit : EventInit {

          -
          A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII alpha)
          +
          A sequence of bytes starting with a 0x3C byte (<), optionally a 0x2F byte (/), and + finally a byte in the range 0x41-0x5A or 0x61-0x7A (A-Z or a-z)
            -
          1. Advance the position pointer so that it points at the next 0x09 - (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E - (ASCII >) byte.

          2. +
          3. Advance the position pointer so that it points at the next 0x09 (HT), + 0x0A (LF), 0x0C (FF), 0x0D (CR), 0x20 (SP), or 0x3E (>) byte.

          4. Repeatedly get an attribute until no further attributes can be found, then jump to the step below labeled next @@ -100601,13 +100581,13 @@ dictionary StorageEventInit : EventInit {

          -
          A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')
          -
          A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')
          -
          A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')
          +
          A sequence of bytes starting with: 0x3C 0x21 (`<!`)
          +
          A sequence of bytes starting with: 0x3C 0x2F (`</`)
          +
          A sequence of bytes starting with: 0x3C 0x3F (`<?`)
          -

          Advance the position pointer so that it points at the first 0x3E byte - (ASCII >) that comes after the 0x3C byte that was found.

          +

          Advance the position pointer so that it points at the first 0x3E byte (>) that + comes after the 0x3C byte that was found.

          @@ -100632,10 +100612,11 @@ dictionary StorageEventInit : EventInit {
            -
          1. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), - 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x2F (ASCII /) then advance position to the next byte and redo this step.

          2. +
          3. If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), + 0x20 (SP), or 0x2F (/) then advance position to the next byte and redo this + step.

          4. -
          5. If the byte at position is 0x3E (ASCII >), then abort the

            If the byte at position is 0x3E (>), then abort the get an attribute algorithm. There isn't one.

          6. @@ -100647,34 +100628,33 @@ dictionary StorageEventInit : EventInit {
            -
            If it is 0x3D (ASCII =), and the attribute name is longer than the - empty string
            +
            If it is 0x3D (=), and the attribute name is longer than the empty string
            Advance position to the next byte and jump to the step below labeled value.
            -
            If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 - (ASCII space)
            +
            If it is 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), or 0x20 (SP)
            Jump to the step below labeled spaces.
            -
            If it is 0x2F (ASCII /) or 0x3E (ASCII >)
            +
            If it is 0x2F (/) or 0x3E (>)
            Abort the get an attribute algorithm. The attribute's name is the value of attribute name, its value is the empty string.
            -
            If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)
            +
            If it is in the range 0x41 (A) to 0x5A (Z)
            -
            Append the Unicode character with code point b+0x20 to attribute name (where b - is the value of the byte at position). (This converts the input to - lowercase.)
            +
            Append the code point b+0x20 to attribute name + (where b is the value of the byte at position). (This converts the input + to lowercase.)
            Anything else
            -
            Append the Unicode character with the same code point as the value of the byte at position to attribute name. (It doesn't actually matter how - bytes outside the ASCII range are handled here, since only ASCII characters can contribute to - the detection of a character encoding.)
            +
            Append the code point with the same value as the byte at position to + attribute name. (It doesn't actually matter how bytes outside the ASCII range are + handled here, since only ASCII bytes can contribute to the detection of a character + encoding.)
            @@ -100683,24 +100663,25 @@ dictionary StorageEventInit : EventInit {
          7. Advance position to the next byte and return to the previous step.

          8. -
          9. Spaces: If the byte at position is one of 0x09 (ASCII TAB), - 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step.

          10. +
          11. Spaces: If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C + (FF), 0x0D (CR), or 0x20 (SP) then advance position to the next byte, then, repeat + this step.

          12. -
          13. If the byte at position is not 0x3D (ASCII =), abort the - get an attribute algorithm. The - attribute's name is the value of attribute name, its value is the empty - string.

          14. +
          15. If the byte at position is not 0x3D (=), abort the get an attribute algorithm. The attribute's + name is the value of attribute name, its value is the empty string.

          16. -
          17. Advance position past the 0x3D (ASCII =) byte.

          18. +
          19. Advance position past the 0x3D (=) byte.

          20. -
          21. Value: If the byte at position is one of 0x09 (ASCII TAB), 0x0A - (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step.

          22. +
          23. Value: If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C + (FF), 0x0D (CR), or 0x20 (SP) then advance position to the next byte, then, repeat + this step.

          24. Process the byte at position as follows:

            -
            If it is 0x22 (ASCII ") or 0x27 (ASCII ')
            +
            If it is 0x22 (") or 0x27 (')
            @@ -100714,12 +100695,12 @@ dictionary StorageEventInit : EventInit { "get an attribute" algorithm. The attribute's name is the value of attribute name, and its value is the value of attribute value.
          25. -
          26. Otherwise, if the value of the byte at position is in the range 0x41 - (ASCII A) to 0x5A (ASCII Z), then append a Unicode character to attribute - value whose code point is 0x20 more than the value of the byte at position.
          27. +
          28. Otherwise, if the value of the byte at position is in the range 0x41 (A) to + 0x5A (Z), then append a code point to attribute value whose value is 0x20 more + than the value of the byte at position.
          29. -
          30. Otherwise, append a Unicode character to attribute value whose code - point is the same as the value of the byte at position.
          31. +
          32. Otherwise, append a code point to attribute value whose value is the same as + the value of the byte at position.
          33. Return to the step above labeled quote loop.
          34. @@ -100727,20 +100708,23 @@ dictionary StorageEventInit : EventInit { -
            If it is 0x3E (ASCII >)
            +
            If it is 0x3E (>)
            Abort the get an attribute algorithm. The attribute's name is the value of attribute name, its value is the empty string.
            -
            If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)
            +
            If it is in the range 0x41 (A) to 0x5A (Z)
            -
            Append the Unicode character with code point b+0x20 to attribute value (where b is the value of the byte at position). Advance position to the next byte.
            +
            Append a code point b+0x20 to attribute value + (where b is the value of the byte at position). Advance + position to the next byte.
            Anything else
            -
            Append the Unicode character with the same code point as the value of the byte at position to attribute value. Advance position to the next byte.
            +
            Append a code point with the same value as the byte at position to + attribute value. Advance position to the next byte.
          @@ -100751,20 +100735,21 @@ dictionary StorageEventInit : EventInit {
          -
          If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII - space), or 0x3E (ASCII >)
          +
          If it is 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), 0x20 (SP), or 0x3E (>)
          Abort the get an attribute algorithm. The attribute's name is the value of attribute name and its value is the value of attribute value.
          -
          If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)
          +
          If it is in the range 0x41 (A) to 0x5A (Z)
          -
          Append the Unicode character with code point b+0x20 to attribute value (where b is the value of the byte at position).
          +
          Append a code point b+0x20 to attribute value + (where b is the value of the byte at position).
          Anything else
          -
          Append the Unicode character with the same code point as the value of the byte at position to attribute value.
          +
          Append a code point with the same value as the byte at position to + attribute value.
          @@ -100894,9 +100879,9 @@ dictionary StorageEventInit : EventInit { U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse errors. These are all - control characters or permanently undefined Unicode characters (noncharacters).

          + control characters or permanently undefined characters (noncharacters).

          -

          Any character that is a not a Unicode character, i.e. any isolated +

          Any character that is a not a scalar value, i.e. any isolated surrogate, is a parse error. (These can only find their way into the input stream via script APIs such as document.write().)

          @@ -103604,7 +103589,7 @@ dictionary StorageEventInit : EventInit { -
          Number Unicode character +
          Number Code point
          0x00 0xFFFD REPLACEMENT CHARACTER @@ -103655,10 +103640,10 @@ dictionary StorageEventInit : EventInit { 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.

          -

          Set the temporary buffer to the empty string. Append the - Unicode character with code point equal to the character - reference code to the temporary buffer. Switch to the - character reference end state.

          +

          Set the temporary buffer to the empty string. Append a + code point equal to the character reference code to + the temporary buffer. Switch to the character reference + end state.

          Character reference end state
          @@ -108378,9 +108363,9 @@ document.body.appendChild(text);

          If the XML API being used restricts the allowable characters in the local names of elements and attributes, then the tool may map all element and attribute local names that the API wouldn't support to a set of names that are allowed, by replacing any character that isn't - supported with the uppercase letter U and the six digits of the character's Unicode code point - when expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in - increasing numeric order.

          + supported with the uppercase letter U and the six digits of the character's code point when + expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in increasing + numeric order.

          For example, the element name foo<bar, which can be output by the HTML parser, though it is neither a legal HTML element name nor a @@ -114335,7 +114320,7 @@ interface External {

          Magic number(s):
          text/ping resources always consist of the four - bytes 0x50 0x49 0x4E 0x47 (ASCII 'PING').
          + bytes 0x50 0x49 0x4E 0x47 (`PING`).
          File extension(s):
          No specific file extension is recommended for this type.
          Macintosh file type code(s):
          @@ -116758,7 +116743,7 @@ interface External {
          accesskey HTML elements Keyboard shortcut to activate or focus element - Ordered set of unique space-separated tokens, case-sensitive, consisting of one Unicode code point in length + Ordered set of unique space-separated tokens, case-sensitive, consisting of one code point in length
          action form