-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Provide mechanism to scan all tokens in an HTML document, not only the tags. #5683
Conversation
3dede00
to
a25b57a
Compare
…, end) This patch follows-up with earlier design questions around how to represent spans of strings inside the class. It's relevant now as preparation for WordPress#5683. The mixture of (offset, length) and (start, end) coordinates becomes confusing at times and all final string operations are performed with the (offset, length) pair, since these feed into `strlen()`. In preparation for exposing all tokens within an HTML document this change: - Unifies the representation throughout the class. - It creates `token_starts_at` to track the start of the current token. - It replaces `tag_ends_at` with `token_length` for re-use with other token types. There should be no functional or behavioral changes in this patch. For the internal helper classes this patch introduces breaking changes, but those classes are marked private and should not be used outside of the HTML API itself.
…, end) This patch follows-up with earlier design questions around how to represent spans of strings inside the class. It's relevant now as preparation for WordPress#5683. The mixture of (offset, length) and (start, end) coordinates becomes confusing at times and all final string operations are performed with the (offset, length) pair, since these feed into `strlen()`. In preparation for exposing all tokens within an HTML document this change: - Unifies the representation throughout the class. - It creates `token_starts_at` to track the start of the current token. - It replaces `tag_ends_at` with `token_length` for re-use with other token types. There should be no functional or behavioral changes in this patch. For the internal helper classes this patch introduces breaking changes, but those classes are marked private and should not be used outside of the HTML API itself.
…, end) This patch follows-up with earlier design questions around how to represent spans of strings inside the class. It's relevant now as preparation for WordPress#5683. The mixture of (offset, length) and (start, end) coordinates becomes confusing at times and all final string operations are performed with the (offset, length) pair, since these feed into `strlen()`. In preparation for exposing all tokens within an HTML document this change: - Unifies the representation throughout the class. - It creates `token_starts_at` to track the start of the current token. - It replaces `tag_ends_at` with `token_length` for re-use with other token types. There should be no functional or behavioral changes in this patch. For the internal helper classes this patch introduces breaking changes, but those classes are marked private and should not be used outside of the HTML API itself.
156a31e
to
c22cd4b
Compare
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
f19a5cb
to
cc96ed2
Compare
51f432c
to
103a556
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the shape this is taking. I've left several thoughts comments and questions from this first pass. I'd like to take a look at the processing of comments because I think we can fix that @todo in this PR.
I also want to see what feedback the html5lib tests can give us so I'll take some time to see what it looks like to run them against this PR with additional handling of more node types (one of your todo items in the description).
I haven't gone through everything yet, only the main implementation changes.
* - `#text` nodes, whose entire token _is_ the modifiable text. | ||
* - Comment nodes and nodes that became comments because of some syntax error. The | ||
* text for these nodes is the portion of the comment inside of the syntax. E.g. for | ||
* `<!-- comment -->` the text is `" comment "` (note that the spaces are part of it). | ||
* - `CDATA` sections, whose text is the content inside of the section itself. E.g. for | ||
* `<![CDATA[some content]]>` the text is `"some content"`. | ||
* - "Funky comments," which are a special case of invalid closing tags whose name is | ||
* invalid. The text for these nodes is the text that a browser would transform into | ||
* an HTML when parsing. E.g. for `</%post_author>` the text is `%post_author`. | ||
* | ||
* And there are non-elements which are atomic in nature but have no modifiable text. | ||
* - `DOCTYPE` nodes like `<DOCTYPE html>` which have no closing tag. | ||
* - The empty end tag `</>` which is ignored in the browser and DOM but exposed | ||
* to the HTML API. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirreal I'm not sure yet on what do to here. can we tag this for follow-up after merge?
I'm a bit concerned about using specific property names here because this is supposed to be the explanatory section of the documentation and I don't want to couple the description to our own terms; I want it to read comfortable for someone coming in with an HTML background - that is, leave things a bit loose here to guide an understanding without pinning it to one specific technicality.
nonetheless I've taken another pass at the comment to update it based on how this has developed.
* | *Text node* | Found a #text node; this is plaintext and modifiable. | | ||
* | *CDATA node* | Found a CDATA section; this is modifiable. | | ||
* | *Comment* | Found a comment or bogus comment; this is modifiable. | |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
case self::STATE_DOCTYPE: | ||
return '#doctype'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return #doctype
here? The html
value we'd get from get_token_name
is confusing but aligns with what the browser does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is specifically to expose what kind of token it is. I don't like conflating it with the HTML
tag name for an Element, even though one is lower-case and the other upper-case.
in my own explorations I found it helpful to have both functions: one to say the node name like the browser would, and one to say the node type (also like how the browser would). I've also been trying to balance the use of longer constants against cearly-searchable text values since this is a more consumer-oriented function.
switch ( $processor->get_token_type() ) {
case WP_HTML_Processor::NODE_TYPE_DOCUMENT_TYPE:
case '#doctype':
…
}
at this point I'm assuming people will use the string value even if the constant exists. also I started with get_node_type()
and get_node_name()
but then renamed to _token_
because I wanted to support a slightly different set of kinds; I'm doubting this since discovering the challenge of partial documents with invalid comment syntaxes, but haven't completely abandoned the idea yet, particularly because of the support for presumptuous tags and funky comments, which aren't in the DOM API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this makes sense 👍
9f29920
to
1098c19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this is ready to merge.
$processor = WP_HTML_Processor::create_fragment( '<![CDATA[this is a comment]]>' ); | ||
$processor->next_token(); | ||
|
||
$this->assertSame( | ||
'#cdata-section', | ||
$processor->get_token_name(), | ||
"Should have found CDATA section name but found {$processor->get_token_name()} instead." | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get hung up on CDATA and PI handling, this doesn't block merging this PR.
I think this behavior is what you described in Slack here:
…it finds those comments (to the first
>
), and then if it ends in]]>
and starts with<![CDATA[
we can safely say, "this is a CDATA node" … though the actual rules for those are more complicated and we can only support a subset now
We can discuss this more in a follow-up, but I'm reluctant to diverge from the specification. This is not a cdata-section with the text content this is a comment
(unless we were in svg or math foreign content), this is a comment with the text content [CDATA[this is a comment]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'm torn a bit but also I find that things are slightly different since we're not building a DOM here. when considering the intentionality behind some HTML string, I think it's evident that if someone writes <![CDATA[something]]>
then they clearly meant to product what they consider a CDATA
section, and WordPress itself still creates these for legacy reasons (even though it may be the case outside of WordPress's XML outputs that those aren't needed anymroe).
so this does conflate with a comment whose text is [CDATA[this is a comment]]
, but if we only indicate that we have also lost the ability to differentiate these two strings, which in my opinion have divergent histories and intents:
<!--[CDATA[this is a comment]]-->
<![CDATA[this is a comment]]>
what I see as the potential failure here is that we hold fixed a comment structure someone can't get rid of because we're only allowing adjustment inside the [
and ]
, but ultimately in the browser they both disappear as comments.
the case I was far more concerned with is the one we fixed, which is when we think that the inner text is 5 > 3
or [CDATA[5 > 3]]
when in fact it's truly 5
or [CDATA[5
since these represent a divergence in token boundaries from the browser (which we still have somewhat at play inside foreign elements).
I'm having similar vibes about representing <!-->
and <!--->
because right now we're not exposing those as changeable comments. again, someone might miss these because of the representation, but they won't cause the parser to get off track and they won't change the rendered view of the page.
let's keep talking because I'd like to push this as far as possible. I really want it to work that we expose these as separate entities. a possible compromise is to maintain a separate indicator specifying type_of_comment
which could be BOGUS_COMMENT
, CDATA
, VALID_COMMENT
, etc…, but that also introduces more API surface so I want to have a good feeling that it's necessary before putting it there.
); | ||
|
||
$processor->next_token(); | ||
|
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
tests/phpunit/tests/html-api/wpHtmlTagProcessor-token-scanning.php
Outdated
Show resolved
Hide resolved
case self::STATE_DOCTYPE: | ||
return '#doctype'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this makes sense 👍
0ca080a
to
f502153
Compare
4959837
to
633804a
Compare
* | ||
* <!--> | ||
* <!---> | ||
* <!----> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not abruptly closed, we have start <!--
and end -->
, with empty text content.
Here are two examples from the html5lib-tests. There's no comment error with <!---->
, but with <!--->
there's an "abrupt-closing-of-empty-comment" error.
* <!----> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. the comment was wrong but the code appears to have been good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a last-minute mini-panic thinking that we aren't detecting <!---!>
as an abruptly-closed comment, but the Tag Processor is already right! it's not, and the comment continues. thankfully the code in this branch and in trunk
handles it properly
5065bee
to
7d9786d
Compare
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f - Coding style changes. - WordPress/wordpress-develop#5762 Adds support for the "any other tag" sections in the HTML Processor. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. - WordPress/wordpress-develop#5897 Adds support for HR elements in the HTML Processor. - WordPress/wordpress-develop#5895 Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements in the HTML Processor. - WordPress/wordpress-develop#5903 Adds support for the PRE and LISTING elements in the HTML Processor. - WordPress/wordpress-develop#5913 Updates "all other tags" support in HTML Processor and updates list of void elements. - WordPress/wordpress-develop#5906 Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor. - WordPress/wordpress-develop#5683 Provides mechanism to scan all tokens in an HTML document in the Tag Processor. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. Co-authored-by: Sergey Biryukov <[email protected]> Co-authored-by: Jon Surrell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed all of the recent changes and they look good to me. I left a few suggestions and I want to make sure we add LISTING
to the special handling that removes starting newlines for PRE
and TEXTAREA
content.
faf9cef
to
9d01322
Compare
Since its introduction in WordPress 6.2 the HTML Tag Processor has provided a way to scan through all of the HTML tags in a document and then read and modify their attributes. In order to reliably do this, it also needed to be aware of other kinds of HTML syntax, but it didn't expose those syntax tokens to consumers of the API. In this patch the Tag Processor introduces a new scanning method and a few helper methods to read information about or from each token. Most significantly, this introduces the ability to read `#text` nodes in the document. What's new in the Tag Processor? ================================ - `next_token()` visits every distinct syntax token in a document. - `get_token_type()` indicates what kind of token it is. - `get_token_name()` returns something akin to `DOMNode.nodeName`. - `get_modifiable_text()` returns the text associated with a token. - `get_comment_type()` indicates why a token represents an HTML comment. Example usage. ============== ```php function strip_all_tags( $html ) { $text_content = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_type() ) { continue; } $text_content .= $processor->get_modifiable_text(); } return $text_content; } ``` What changes in the Tag Processor? ================================== Previously, the Tag Processor would scan the opening and closing tag of every HTML element separately. Now, however, there are special tags which it only visits once, as if those elements were void tags without a closer. These are special tags because their content contains no other HTML or markup, only non-HTML content. - SCRIPT elements contain raw text which is isolated from the rest of the HTML document and fed separately into a JavaScript engine. There are complicated rules to avoid escaping the script context in the HTML. The contents are left verbatim, and character references are not decoded. - TEXTARA and TITLE elements contain plain text which is decoded before display, e.g. transforming `&` into `&`. Any markup which resembles tags is treated as verbatim text and not a tag. - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the textarea and title elements, but no character references are decoded. For example, `&` inside a STYLE element is passed to the CSS engine as the literal string `&` and _not_ as `&`. Because it's important not treat this inner content separately from the elements containing it, the Tag Processor combines them when scanning into a single match and makes their content available as modifiable text (see below). This means that the Tag Processor will no longer visit a closing tag for any of these elements unless that tag is unexpected. <title>There is only a single token in this line</title> <title>There are two tokens in this line></title></title> </title><title>There are still two tokens in this line></title> What are tokens? ================ The term "token" here is a parsing term, which means a primitive unit in HTML. There are only a few kinds of tokens in HTML: - a tag has a name, attributes, and a closing or self-closing flag. - a text node, or `#text` node contains plain text which is displayed in a browser and which is decoded before display. - a DOCTYPE declaration indicates how to parse the document. - a comment is hidden from the display on a page but present in the HTML. There are a few more kinds of tokens that the HTML Tag Processor will recognize, some of which don't exist as concepts in HTML. These mostly comprise XML syntax elements that aren't part of HTML (such as CDATA and processing instructions) and invalid HTML syntax that transforms into comments. What is a funky comment? ======================== This patch treats a specific kind of invalid comment in a special way. A closing tag with an invalid name is considered a "funky comment." In the browser these become HTML comments just like any other, but their syntax is convenient for representing a variety of bits of information in a well-defined way and which cannot be nested or recursive, given the parsing rules handling this invalid syntax. - `</1>` - `</%avatar_url>` - `</{"wp_bit": {"type": "post-author"}}>` - `</[post-author]>` - `</__( 'Save Post' );>` All of these examples become HTML comments in the browser. The content inside the funky content is easily parsable, whereby the only rule is that it starts at the `<` and continues until the nearest `>`. There can be no funky comment inside another, because that would imply having a `>` inside of one, which would actually terminate the first one. What is modifiable text? ======================== Modifiable text is similar to the `innerText` property of a DOM node. It represents the span of text for a given token which may be modified without changing the structure of the HTML document or the token. There is currently no mechanism to change the modifiable text, but this is planned to arrive in a later patch. Tags ==== Most tags have no modifiable text because they have child nodes where text nodes are found. Only the special tags mentioned above have modifiable text. <div class="post">Another day in HTML</div> └─ tag ──────────┘└─ text node ─────┘└────┴─ tag <title>Is <img> > <image>?</title> │ └ modifiable text ───┘ │ "Is <img> > <image>?" └─ tag ─────────────────────────────┘ Text nodes ========== Text nodes are entirely modifiable text. This HTML document has no tags. └─ modifiable text ───────────┘ Comments ======== The modifiable text inside a comment is the portion of the comment that doesn't form its syntax. This applies for a number of invalid comments. <!-- this is inside a comment --> │ └─ modifiable text ──────┘ │ └─ comment token ───────────────┘ <!--> This invalid comment has no modifiable text. <? this is an invalid comment --> │ └─ modifiable text ────────┘ │ └─ comment token ───────────────┘ <[CDATA[this is an invalid comment]]> │ └─ modifiable text ───────┘ │ └─ comment token ───────────────────┘ Other token types also have modifiable text. Consult the code or tests for further information.
9d01322
to
30991d7
Compare
Since its introduction in WordPress 6.2 the HTML Tag Processor has provided a way to scan through all of the HTML tags in a document and then read and modify their attributes. In order to reliably do this, it also needed to be aware of other kinds of HTML syntax, but it didn't expose those syntax tokens to consumers of the API. In this patch the Tag Processor introduces a new scanning method and a few helper methods to read information about or from each token. Most significantly, this introduces the ability to read `#text` nodes in the document. What's new in the Tag Processor? ================================ - `next_token()` visits every distinct syntax token in a document. - `get_token_type()` indicates what kind of token it is. - `get_token_name()` returns something akin to `DOMNode.nodeName`. - `get_modifiable_text()` returns the text associated with a token. - `get_comment_type()` indicates why a token represents an HTML comment. Example usage. ============== {{{ <?php function strip_all_tags( $html ) { $text_content = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_type() ) { continue; } $text_content .= $processor->get_modifiable_text(); } return $text_content; } }}} What changes in the Tag Processor? ================================== Previously, the Tag Processor would scan the opening and closing tag of every HTML element separately. Now, however, there are special tags which it only visits once, as if those elements were void tags without a closer. These are special tags because their content contains no other HTML or markup, only non-HTML content. - SCRIPT elements contain raw text which is isolated from the rest of the HTML document and fed separately into a JavaScript engine. There are complicated rules to avoid escaping the script context in the HTML. The contents are left verbatim, and character references are not decoded. - TEXTARA and TITLE elements contain plain text which is decoded before display, e.g. transforming `&` into `&`. Any markup which resembles tags is treated as verbatim text and not a tag. - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the textarea and title elements, but no character references are decoded. For example, `&` inside a STYLE element is passed to the CSS engine as the literal string `&` and _not_ as `&`. Because it's important not treat this inner content separately from the elements containing it, the Tag Processor combines them when scanning into a single match and makes their content available as modifiable text (see below). This means that the Tag Processor will no longer visit a closing tag for any of these elements unless that tag is unexpected. {{{ <title>There is only a single token in this line</title> <title>There are two tokens in this line></title></title> </title><title>There are still two tokens in this line></title> }}} What are tokens? ================ The term "token" here is a parsing term, which means a primitive unit in HTML. There are only a few kinds of tokens in HTML: - a tag has a name, attributes, and a closing or self-closing flag. - a text node, or `#text` node contains plain text which is displayed in a browser and which is decoded before display. - a DOCTYPE declaration indicates how to parse the document. - a comment is hidden from the display on a page but present in the HTML. There are a few more kinds of tokens that the HTML Tag Processor will recognize, some of which don't exist as concepts in HTML. These mostly comprise XML syntax elements that aren't part of HTML (such as CDATA and processing instructions) and invalid HTML syntax that transforms into comments. What is a funky comment? ======================== This patch treats a specific kind of invalid comment in a special way. A closing tag with an invalid name is considered a "funky comment." In the browser these become HTML comments just like any other, but their syntax is convenient for representing a variety of bits of information in a well-defined way and which cannot be nested or recursive, given the parsing rules handling this invalid syntax. - `</1>` - `</%avatar_url>` - `</{"wp_bit": {"type": "post-author"}}>` - `</[post-author]>` - `</__( 'Save Post' );>` All of these examples become HTML comments in the browser. The content inside the funky content is easily parsable, whereby the only rule is that it starts at the `<` and continues until the nearest `>`. There can be no funky comment inside another, because that would imply having a `>` inside of one, which would actually terminate the first one. What is modifiable text? ======================== Modifiable text is similar to the `innerText` property of a DOM node. It represents the span of text for a given token which may be modified without changing the structure of the HTML document or the token. There is currently no mechanism to change the modifiable text, but this is planned to arrive in a later patch. Tags ==== Most tags have no modifiable text because they have child nodes where text nodes are found. Only the special tags mentioned above have modifiable text. {{{ <div class="post">Another day in HTML</div> └─ tag ──────────┘└─ text node ─────┘└────┴─ tag }}} {{{ <title>Is <img> > <image>?</title> │ └ modifiable text ───┘ │ "Is <img> > <image>?" └─ tag ─────────────────────────────┘ }}} Text nodes ========== Text nodes are entirely modifiable text. {{{ This HTML document has no tags. └─ modifiable text ───────────┘ }}} Comments ======== The modifiable text inside a comment is the portion of the comment that doesn't form its syntax. This applies for a number of invalid comments. {{{ <!-- this is inside a comment --> │ └─ modifiable text ──────┘ │ └─ comment token ───────────────┘ }}} {{{ <!--> This invalid comment has no modifiable text. }}} {{{ <? this is an invalid comment --> │ └─ modifiable text ────────┘ │ └─ comment token ───────────────┘ }}} {{{ <[CDATA[this is an invalid comment]]> │ └─ modifiable text ───────┘ │ └─ comment token ───────────────────┘ }}} Other token types also have modifiable text. Consult the code or tests for further information. Developed in #5683 Discussed in https://core.trac.wordpress.org/ticket/60170 Follows [57575] Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam Fixes #60170 git-svn-id: https://develop.svn.wordpress.org/trunk@57348 602fd350-edb4-49c9-b593-d223f7449a82
Since its introduction in WordPress 6.2 the HTML Tag Processor has provided a way to scan through all of the HTML tags in a document and then read and modify their attributes. In order to reliably do this, it also needed to be aware of other kinds of HTML syntax, but it didn't expose those syntax tokens to consumers of the API. In this patch the Tag Processor introduces a new scanning method and a few helper methods to read information about or from each token. Most significantly, this introduces the ability to read `#text` nodes in the document. What's new in the Tag Processor? ================================ - `next_token()` visits every distinct syntax token in a document. - `get_token_type()` indicates what kind of token it is. - `get_token_name()` returns something akin to `DOMNode.nodeName`. - `get_modifiable_text()` returns the text associated with a token. - `get_comment_type()` indicates why a token represents an HTML comment. Example usage. ============== {{{ <?php function strip_all_tags( $html ) { $text_content = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_type() ) { continue; } $text_content .= $processor->get_modifiable_text(); } return $text_content; } }}} What changes in the Tag Processor? ================================== Previously, the Tag Processor would scan the opening and closing tag of every HTML element separately. Now, however, there are special tags which it only visits once, as if those elements were void tags without a closer. These are special tags because their content contains no other HTML or markup, only non-HTML content. - SCRIPT elements contain raw text which is isolated from the rest of the HTML document and fed separately into a JavaScript engine. There are complicated rules to avoid escaping the script context in the HTML. The contents are left verbatim, and character references are not decoded. - TEXTARA and TITLE elements contain plain text which is decoded before display, e.g. transforming `&` into `&`. Any markup which resembles tags is treated as verbatim text and not a tag. - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the textarea and title elements, but no character references are decoded. For example, `&` inside a STYLE element is passed to the CSS engine as the literal string `&` and _not_ as `&`. Because it's important not treat this inner content separately from the elements containing it, the Tag Processor combines them when scanning into a single match and makes their content available as modifiable text (see below). This means that the Tag Processor will no longer visit a closing tag for any of these elements unless that tag is unexpected. {{{ <title>There is only a single token in this line</title> <title>There are two tokens in this line></title></title> </title><title>There are still two tokens in this line></title> }}} What are tokens? ================ The term "token" here is a parsing term, which means a primitive unit in HTML. There are only a few kinds of tokens in HTML: - a tag has a name, attributes, and a closing or self-closing flag. - a text node, or `#text` node contains plain text which is displayed in a browser and which is decoded before display. - a DOCTYPE declaration indicates how to parse the document. - a comment is hidden from the display on a page but present in the HTML. There are a few more kinds of tokens that the HTML Tag Processor will recognize, some of which don't exist as concepts in HTML. These mostly comprise XML syntax elements that aren't part of HTML (such as CDATA and processing instructions) and invalid HTML syntax that transforms into comments. What is a funky comment? ======================== This patch treats a specific kind of invalid comment in a special way. A closing tag with an invalid name is considered a "funky comment." In the browser these become HTML comments just like any other, but their syntax is convenient for representing a variety of bits of information in a well-defined way and which cannot be nested or recursive, given the parsing rules handling this invalid syntax. - `</1>` - `</%avatar_url>` - `</{"wp_bit": {"type": "post-author"}}>` - `</[post-author]>` - `</__( 'Save Post' );>` All of these examples become HTML comments in the browser. The content inside the funky content is easily parsable, whereby the only rule is that it starts at the `<` and continues until the nearest `>`. There can be no funky comment inside another, because that would imply having a `>` inside of one, which would actually terminate the first one. What is modifiable text? ======================== Modifiable text is similar to the `innerText` property of a DOM node. It represents the span of text for a given token which may be modified without changing the structure of the HTML document or the token. There is currently no mechanism to change the modifiable text, but this is planned to arrive in a later patch. Tags ==== Most tags have no modifiable text because they have child nodes where text nodes are found. Only the special tags mentioned above have modifiable text. {{{ <div class="post">Another day in HTML</div> └─ tag ──────────┘└─ text node ─────┘└────┴─ tag }}} {{{ <title>Is <img> > <image>?</title> │ └ modifiable text ───┘ │ "Is <img> > <image>?" └─ tag ─────────────────────────────┘ }}} Text nodes ========== Text nodes are entirely modifiable text. {{{ This HTML document has no tags. └─ modifiable text ───────────┘ }}} Comments ======== The modifiable text inside a comment is the portion of the comment that doesn't form its syntax. This applies for a number of invalid comments. {{{ <!-- this is inside a comment --> │ └─ modifiable text ──────┘ │ └─ comment token ───────────────┘ }}} {{{ <!--> This invalid comment has no modifiable text. }}} {{{ <? this is an invalid comment --> │ └─ modifiable text ────────┘ │ └─ comment token ───────────────┘ }}} {{{ <[CDATA[this is an invalid comment]]> │ └─ modifiable text ───────┘ │ └─ comment token ───────────────────┘ }}} Other token types also have modifiable text. Consult the code or tests for further information. Developed in WordPress/wordpress-develop#5683 Discussed in https://core.trac.wordpress.org/ticket/60170 Follows [57575] Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam Fixes #60170 Built from https://develop.svn.wordpress.org/trunk@57348 git-svn-id: http://core.svn.wordpress.org/trunk@56854 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Since its introduction in WordPress 6.2 the HTML Tag Processor has provided a way to scan through all of the HTML tags in a document and then read and modify their attributes. In order to reliably do this, it also needed to be aware of other kinds of HTML syntax, but it didn't expose those syntax tokens to consumers of the API. In this patch the Tag Processor introduces a new scanning method and a few helper methods to read information about or from each token. Most significantly, this introduces the ability to read `#text` nodes in the document. What's new in the Tag Processor? ================================ - `next_token()` visits every distinct syntax token in a document. - `get_token_type()` indicates what kind of token it is. - `get_token_name()` returns something akin to `DOMNode.nodeName`. - `get_modifiable_text()` returns the text associated with a token. - `get_comment_type()` indicates why a token represents an HTML comment. Example usage. ============== {{{ <?php function strip_all_tags( $html ) { $text_content = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_type() ) { continue; } $text_content .= $processor->get_modifiable_text(); } return $text_content; } }}} What changes in the Tag Processor? ================================== Previously, the Tag Processor would scan the opening and closing tag of every HTML element separately. Now, however, there are special tags which it only visits once, as if those elements were void tags without a closer. These are special tags because their content contains no other HTML or markup, only non-HTML content. - SCRIPT elements contain raw text which is isolated from the rest of the HTML document and fed separately into a JavaScript engine. There are complicated rules to avoid escaping the script context in the HTML. The contents are left verbatim, and character references are not decoded. - TEXTARA and TITLE elements contain plain text which is decoded before display, e.g. transforming `&` into `&`. Any markup which resembles tags is treated as verbatim text and not a tag. - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the textarea and title elements, but no character references are decoded. For example, `&` inside a STYLE element is passed to the CSS engine as the literal string `&` and _not_ as `&`. Because it's important not treat this inner content separately from the elements containing it, the Tag Processor combines them when scanning into a single match and makes their content available as modifiable text (see below). This means that the Tag Processor will no longer visit a closing tag for any of these elements unless that tag is unexpected. {{{ <title>There is only a single token in this line</title> <title>There are two tokens in this line></title></title> </title><title>There are still two tokens in this line></title> }}} What are tokens? ================ The term "token" here is a parsing term, which means a primitive unit in HTML. There are only a few kinds of tokens in HTML: - a tag has a name, attributes, and a closing or self-closing flag. - a text node, or `#text` node contains plain text which is displayed in a browser and which is decoded before display. - a DOCTYPE declaration indicates how to parse the document. - a comment is hidden from the display on a page but present in the HTML. There are a few more kinds of tokens that the HTML Tag Processor will recognize, some of which don't exist as concepts in HTML. These mostly comprise XML syntax elements that aren't part of HTML (such as CDATA and processing instructions) and invalid HTML syntax that transforms into comments. What is a funky comment? ======================== This patch treats a specific kind of invalid comment in a special way. A closing tag with an invalid name is considered a "funky comment." In the browser these become HTML comments just like any other, but their syntax is convenient for representing a variety of bits of information in a well-defined way and which cannot be nested or recursive, given the parsing rules handling this invalid syntax. - `</1>` - `</%avatar_url>` - `</{"wp_bit": {"type": "post-author"}}>` - `</[post-author]>` - `</__( 'Save Post' );>` All of these examples become HTML comments in the browser. The content inside the funky content is easily parsable, whereby the only rule is that it starts at the `<` and continues until the nearest `>`. There can be no funky comment inside another, because that would imply having a `>` inside of one, which would actually terminate the first one. What is modifiable text? ======================== Modifiable text is similar to the `innerText` property of a DOM node. It represents the span of text for a given token which may be modified without changing the structure of the HTML document or the token. There is currently no mechanism to change the modifiable text, but this is planned to arrive in a later patch. Tags ==== Most tags have no modifiable text because they have child nodes where text nodes are found. Only the special tags mentioned above have modifiable text. {{{ <div class="post">Another day in HTML</div> └─ tag ──────────┘└─ text node ─────┘└────┴─ tag }}} {{{ <title>Is <img> > <image>?</title> │ └ modifiable text ───┘ │ "Is <img> > <image>?" └─ tag ─────────────────────────────┘ }}} Text nodes ========== Text nodes are entirely modifiable text. {{{ This HTML document has no tags. └─ modifiable text ───────────┘ }}} Comments ======== The modifiable text inside a comment is the portion of the comment that doesn't form its syntax. This applies for a number of invalid comments. {{{ <!-- this is inside a comment --> │ └─ modifiable text ──────┘ │ └─ comment token ───────────────┘ }}} {{{ <!--> This invalid comment has no modifiable text. }}} {{{ <? this is an invalid comment --> │ └─ modifiable text ────────┘ │ └─ comment token ───────────────┘ }}} {{{ <[CDATA[this is an invalid comment]]> │ └─ modifiable text ───────┘ │ └─ comment token ───────────────────┘ }}} Other token types also have modifiable text. Consult the code or tests for further information. Developed in WordPress/wordpress-develop#5683 Discussed in https://core.trac.wordpress.org/ticket/60170 Follows [57575] Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam Fixes #60170 Built from https://develop.svn.wordpress.org/trunk@57348 git-svn-id: https://core.svn.wordpress.org/trunk@56854 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f - Coding style changes. - WordPress/wordpress-develop#5762 Adds support for the "any other tag" sections in the HTML Processor. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. - WordPress/wordpress-develop#5897 Adds support for HR elements in the HTML Processor. - WordPress/wordpress-develop#5895 Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements in the HTML Processor. - WordPress/wordpress-develop#5903 Adds support for the PRE and LISTING elements in the HTML Processor. - WordPress/wordpress-develop#5913 Updates "all other tags" support in HTML Processor and updates list of void elements. - WordPress/wordpress-develop#5906 Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor. - WordPress/wordpress-develop#5683 Provides mechanism to scan all tokens in an HTML document in the Tag Processor. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. Co-authored-by: Sergey Biryukov <[email protected]> Co-authored-by: Jon Surrell <[email protected]>
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f - Coding style changes. - WordPress/wordpress-develop#5762 Adds support for the "any other tag" sections in the HTML Processor. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. - WordPress/wordpress-develop#5897 Adds support for HR elements in the HTML Processor. - WordPress/wordpress-develop#5895 Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements in the HTML Processor. - WordPress/wordpress-develop#5903 Adds support for the PRE and LISTING elements in the HTML Processor. - WordPress/wordpress-develop#5913 Updates "all other tags" support in HTML Processor and updates list of void elements. - WordPress/wordpress-develop#5906 Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor. - WordPress/wordpress-develop#5683 Provides mechanism to scan all tokens in an HTML document in the Tag Processor. - WordPress/wordpress-develop#5907 Adds support for the INPUT element in the HTML Processor The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. Co-authored-by: Sergey Biryukov <[email protected]> Co-authored-by: Jon Surrell <[email protected]>
When parser states were introduced in WordPress#5683, nothing in the `seek()` method reset the parser state. This is problematic because it could leave the parser in the wrong state. In this patch the parser state is reset so that it get's properly adjusted on the successive call to `next_token()`. Follows [57348] Props @kevin940726 for finding and reporting.
Updates from WordPress/wordpress-develop: - From: WordPress/wordpress-develop@54a09a7 - To: WordPress/wordpress-develop@7a71339 - Coding style changes. - WordPress/wordpress-develop#5762 Adds support for the "any other tag" sections in the HTML Processor. - WordPress/wordpress-develop#5539 Adds support for list elements in the HTML Processor. - WordPress/wordpress-develop#5897 Adds support for HR elements in the HTML Processor. - WordPress/wordpress-develop#5895 Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements in the HTML Processor. - WordPress/wordpress-develop#5903 Adds support for the PRE and LISTING elements in the HTML Processor. - WordPress/wordpress-develop#5913 Updates "all other tags" support in HTML Processor and updates list of void elements. - WordPress/wordpress-develop#5906 Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor. - WordPress/wordpress-develop#5907 Adds support for the INPUT element in the HTML Processor - WordPress/wordpress-develop#5683 Provides mechanism to scan all tokens in an HTML document in the Tag Processor. - WordPress/wordpress-develop#5976 Avoids splitting text nodes on "<" character. - WordPress/wordpress-develop#5992 Only recognize true CDATA-lookalike nodes. - WordPress/wordpress-develop#5975 Prevent void tag nesting when calling `next_token()` - WordPress/wordpress-develop#6021 Reset parser state after seeking. - https://core.trac.wordpress.org/changeset/57528 Fix typo in setting token flag. - WordPress/wordpress-develop#6041 Ensure consecutive text is all joined into one text node. The PHP files in the compatability layer are merged and maintained in the Core repo and all changes or updates need to happen first in Core and then be brought over to Gutenberg as built files. Co-authored-by: sergeybiryukov <[email protected]> Co-authored-by: sirreal <[email protected]> Co-authored-by: dmsnell <[email protected]>
* I18N: Prevent PHP warning in `WP_Textdomain_Registry`. Prevents a warning upon cache invalidation after language pack updates if the arguments don’t have the expected format. Follow-up to [57287], [57290], [57298], [57299]. See #58919. git-svn-id: https://develop.svn.wordpress.org/trunk@57303 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty-Four: Remove extra tab character inside the text domain. Follow-up to [57281]. Props sabernhardt. Fixes #60245. git-svn-id: https://develop.svn.wordpress.org/trunk@57304 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Fix duplicate `determine_locale()` tests added in [57286]. Props johnbillion. See #58696. git-svn-id: https://develop.svn.wordpress.org/trunk@57305 602fd350-edb4-49c9-b593-d223f7449a82 * Embeds: Ensure the deprecated function `print_emoji_styles` isn't used Ensure that the proper new function wp_enqueue_emoji_styles is used in embeds. Follow-up to: [56194]. Props peterwilsoncc, bobbingwide, hellofromTonya. Fixes #59892. See: #58775. git-svn-id: https://develop.svn.wordpress.org/trunk@57306 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Fix unstable query tests. Three `WP_Query` tests could randomly fail due to an undefined order because two test posts were using the exact same `post_date`. Props boonebgorges, flixos90. Fixes #60288. git-svn-id: https://develop.svn.wordpress.org/trunk@57308 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Fix several typos in inline comments. Follow-up to [7747], [27419], [55155]. Props shailu25, sabernhardt. Fixes #60285. git-svn-id: https://develop.svn.wordpress.org/trunk@57309 602fd350-edb4-49c9-b593-d223f7449a82 * Media: Redirect inactive attachement pages for logged-out users. Ensure logged out users are redirected to the media file when attachment pages are inactive. This removes the `read_post` capability check from the canonical redirects as anonymous users lack the permission. Follow-up to [56657], [56658], [56711]. Props afercia, aristath, chesio, joppuyo, jorbin, lakshmananphp, poena, sergeybiryukov. Fixes #59866. See #57913. git-svn-id: https://develop.svn.wordpress.org/trunk@57310 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty: Move the Inter font declaration to a separate file and enqueue the file. This allows the font to be dequeued by a child theme or plugin. Props poena, markhowellsmead, nielslange, Otto42, SGr33n, mukesh27, joemcgill. Fixes #48630. git-svn-id: https://develop.svn.wordpress.org/trunk@57311 602fd350-edb4-49c9-b593-d223f7449a82 * Bootstrap/Load: Introduce functions to check whether WordPress is serving a REST API request. This changeset introduces two functions: * `wp_is_serving_rest_request()` returns a boolean for whether WordPress is serving an actual REST API request. * `wp_is_rest_endpoint()` returns a boolean for whether a WordPress REST API endpoint is currently being used. While this is always the case if `wp_is_serving_rest_request()` returns `true`, the function additionally covers the scenario of internal REST API requests, i.e. where WordPress calls a REST API endpoint within the same request. Both functions should only be used after the `parse_request` action. All relevant manual checks have been adjusted to use one of the new functions, depending on the use-case. They were all using the same constant check so far, while in fact some of them were intending to check for an actual REST API request while others were intending to check for REST endpoint usage. A new filter `wp_is_rest_endpoint` can be used to alter the return value of the `wp_is_rest_endpoint()` function. Props lots.0.logs, TimothyBlynJacobs, flixos90, joehoyle, peterwilsoncc, swissspidy, SergeyBiryukov, pento, mikejolley, iandunn, hellofromTonya, Cybr, petitphp. Fixes #42061. git-svn-id: https://develop.svn.wordpress.org/trunk@57312 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty: Add missing comma in `twentytwenty_classic_editor_styles()`. This resolves a WPCS error: {{{ There should be a comma after the last array item in a multi-line array. }}} Follow-up to [57311]. See #48630. git-svn-id: https://develop.svn.wordpress.org/trunk@57313 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Add support for HR element. Adds support for the following HTML elements to the HTML Processor: - HR Previously, this element was not supported and the HTML Processor would bail when encountering it. Now, with this patch, it will proceed to parse an HTML document when encountering one. Developed in WordPress/wordpress-develop#5897 Props jonsurrell, dmsnell Fixes #60283 git-svn-id: https://develop.svn.wordpress.org/trunk@57314 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Support deferred block variation initialization on the server. When registering blocks on the server using `register_block_type()` or similar functions, a set of block type variations can also be registered. However, in some cases building this variation data during block registration can be an expensive process, which is not needed in most contexts. To address this problem, this adds support to the `WP_Block_Type` object for a new property, `variation_callback`, which can be used to register a callback for building variation data only when the block variations data is needed. The `WP_Block_Type::variations` property has been changed to a private property that is now accessed through the magic `__get()` method. The magic getter makes use of a new public method, `WP_Block_Type::get_variations` which will build variations from a registered callback if variations have not already been built. Props spacedmonkey, thekt12, Mamaduka, gaambo, gziolo, mukesh27, joemcgill. Fixes #59969. git-svn-id: https://develop.svn.wordpress.org/trunk@57315 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Add support for BR, EMBED, & other tags. Adds support for the following HTML elements to the HTML Processor: - AREA, BR, EMBED, KEYGEN, WBR - Only the opening BR tag is supported, as the invalid closer `</br>` involves more complicated rules, to be implemented later. Previously, these elements were not supported and the HTML Processor would bail when encountering them. With this patch it will proceed to parse an HTML document when encountering those tags as long as other normal conditions don't cause it to bail (such as complicated format reconstruction rules). Props jonsurrell, dmsnell Fixes #60283 git-svn-id: https://develop.svn.wordpress.org/trunk@57316 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Add support for PRE and LISTING elements. Adds support for the following HTML elements to the HTML Processor: - PRE, LISTING Previously, these elements were not supported and the HTML Processor would bail when encountering them. Now, with this patch applied, it will proceed to parse an HTML document when encountering those tags. Developed in WordPress/wordpress-develop#5903 Props jonsurrell, dmsnell Fixes #60283 git-svn-id: https://develop.svn.wordpress.org/trunk@57317 602fd350-edb4-49c9-b593-d223f7449a82 * Media: Revert [57310]. This commit reintroduced a minor data exposure issue. Props swissspidy. See #59866, #57913. git-svn-id: https://develop.svn.wordpress.org/trunk@57318 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Cleanup tests and list of void elements. This patch adds newly supported elements to tests that should have been updated in recent PRs, but which were merged without that. Those PRs removed failing tests showing that the elements were unsupported, but did not add the elements to the list of supported ones. It also removes some elements from the special-exclusion list of unsupported IN BODY elements. These did not present in failing tests because earlier conditions in the switch structure caught the tags before hitting the default block. Finally it adds some missing elements to the list of void elements. These elements are not listed as void in the HTML specification because they are deprecated. However, they are treated as void for the sake of HTML serialization and the parsing rules indicate that they behave as void elements, so it's safe to list them within the HTML API as void. Developed in WordPress/wordpress-develop#5913 Fixes #60307 git-svn-id: https://develop.svn.wordpress.org/trunk@57319 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Correct the placement of `@global` tags in `wp-settings.php`. Props shailu25, mukesh27. Fixes #60146. git-svn-id: https://develop.svn.wordpress.org/trunk@57320 602fd350-edb4-49c9-b593-d223f7449a82 * Plugins: Correct table layout on smaller screens. This ensures that the message about deleting a plugin or having no plugins installed is displayed in full width. Follow-up to [26134], [33016]. Props shailu25, mukesh27, passoniate, JavierCasares, sabernhardt. Fixes #50069. git-svn-id: https://develop.svn.wordpress.org/trunk@57321 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Expand "imagemin" Grunt task to cover default themes. Runs `npm run grunt precommit:image` to minify/compress images in the repository. Props desrosj. Fixes #58996. git-svn-id: https://develop.svn.wordpress.org/trunk@57322 602fd350-edb4-49c9-b593-d223f7449a82 * Bundled Theme: Fix a couple of incorrect theme name references. Corrects the theme name used in docblocks in two places in Twenty Nineteen and Twenty Seventeen. Props shailu25, mukesh27. Fixes #60310. git-svn-id: https://develop.svn.wordpress.org/trunk@57323 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty-Four: Update license information in readme. Adds missing license information for bundled fonts. Props acosmin, shailu25, poena, sabernhardt. Fixes #59838 git-svn-id: https://develop.svn.wordpress.org/trunk@57324 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Correct the `WP_User Query` location reference in query cache tests. Follow-up to [1047/tests], [33749], [55657]. See #59651. git-svn-id: https://develop.svn.wordpress.org/trunk@57325 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Support PARAM, SOURCE, and TRACK tags. Adds support for the following HTML elements to the HTML Processor: - PARAM, SOURCE, TRACK Previously these elements were not supported and the HTML Processor would bail when encountering them. Now, with this patch applied, it will proceed to parse an HTML document when encountering those tags. Props jonsurrell, dmsnell Fixes #60283 git-svn-id: https://develop.svn.wordpress.org/trunk@57326 602fd350-edb4-49c9-b593-d223f7449a82 * Script Modules API: Rename `wp_module` to `wp_script_module` Renames all mentions to "module" with "script module", including function names, comments, and tests. Follow up to [57269] The list of functions renamed are: - `wp_module()` -> `wp_script_module()`. - `wp_register_module()` -> `wp_register_script_module()`. - `wp_enqueue_module()` -> `wp_enqueue_script_module()`. - `wp_dequeue_module()` -> `wp_dequeue_script_module()`. - `WP_Script_Modules::print_enqueued_modules()` -> `WP_Script_Modules::print_enqueued_script_modules()`. - `WP_Script_Modules::print_module_preloads()` -> `WP_Script_Modules::print_script_module_preloads()`. It also adds PHP 7 typing to all the functions and improves the types of the `$deps` argument of `wp_register_script_module()` and `wp_enqueue_script_module()` using `@type`. Props luisherranz, idad5, costdev, nefff, joemcgill, jorbin, swisspidy, jonsurrel, flixos90, gziolo, westonruter, bernhard-reiter, kamranzafar4343 See #56313 git-svn-id: https://develop.svn.wordpress.org/trunk@57327 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: fix classname output on blocks without layout. Prevents layout classnames from being output on blocks with no layout support and no child layout classnames by returning early from `wp_render_layout_support_flag`. Props andrewserong. Fixes #60292. git-svn-id: https://develop.svn.wordpress.org/trunk@57328 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: fix fluid font division by zero error when min and max viewport widths are equal. Fixes a division error by returning null when `minViewportWidth` - `maxViewportWidth` is zero in `wp_get_computed_fluid_typography_value`. Props ramonopoly, mukesh27, andrewserong, audrasjb. Fixes #60263. git-svn-id: https://develop.svn.wordpress.org/trunk@57329 602fd350-edb4-49c9-b593-d223f7449a82 * Build Tools: Configure prettier properly. Allows tools like prettier or VSCode to auto-format JS files propertly. It pulls the prettier config that is used in the Gutenberg repository. Props gziolo. Fixes #60316. git-svn-id: https://develop.svn.wordpress.org/trunk@57330 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Set show_tagcloud to false for Pattern Categories. Pattern Categories is a taxonomy used to categories the patterns in the site editor. It is not meant to be shown in the frontend and show tag clouds. Props wildworks, mukesh27. Fixes #60119. git-svn-id: https://develop.svn.wordpress.org/trunk@57331 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Ensure PHPUnit10 compatibility for ThemeJson unit test. Expecting E_STRICT, E_NOTICE, and E_USER_NOTICE errors is deprecated in PHPUnit 10. This updates the test to rely on an exception instead. Props antonvlasenko. Fixes #60305. git-svn-id: https://develop.svn.wordpress.org/trunk@57332 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Update the ThemeJson unit test to cover custom CSS feature. In #59499 a fix have been shipped to theme.json custom CSS when applied to blocks with multiple CSS selectors. This commit covers that fix with a unit test. Props wildworks. Fixes #60294. git-svn-id: https://develop.svn.wordpress.org/trunk@57333 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Define the labels of the pattern category taxonomy. In WordPress 6.5, the taxonomy is going to be rendered using a standard UI in the editor, this means that all the labels need to be defined properly. Props ntsekouras. Fixes #60322. git-svn-id: https://develop.svn.wordpress.org/trunk@57334 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix back to items label capitalization for the pattern categories. This uses the same capitalization used in Tags or Link Categories taxonomies. Props mukesh27. See #60322. git-svn-id: https://develop.svn.wordpress.org/trunk@57335 602fd350-edb4-49c9-b593-d223f7449a82 * General: Add $schema property to block and theme JSON files. Additionally, this changeset fixes some of the `block.json` and `theme.json` files in PHPUnit tests by adding missing `title` properties to satisfy the schema. Those changes have no impact on the runtime whatsoever and do not change the result of unit tests. Note that some block and theme JSON files still aren't valid according to the schema. Fixing is underway; the required changes will be merged subsequently. Props jonsurrell, dmsnell, gziolo. Fixes #60255. git-svn-id: https://develop.svn.wordpress.org/trunk@57336 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Introduce a more performant localization library. This introduces a more lightweight library for loading `.mo` translation files which offers increased speed and lower memory usage. It also supports loading multiple locales at the same time, which makes locale switching faster too. For plugins interacting with the `$l10n` global variable in core, a shim is added to retain backward compatibility with the existing `pomo` library. In addition to that, this library supports translations contained in PHP files, avoiding a binary file format and leveraging OPCache if available. If an `.mo` translation file has a corresponding `.l10n.php` file, the latter will be loaded instead. This behavior can be adjusted using the new `translation_file_format` and `load_translation_file` filters. PHP translation files will be typically created by downloading language packs, but can also be generated by plugins. See https://make.wordpress.org/core/2023/11/08/merging-performant-translations-into-core/ for more context. Props dd32, swissspidy, flixos90, joemcgill, westonruter, akirk, SergeyBiryukov. Fixes #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57337 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Add missing variable in string replacement. Ensures the preferred file name for lookup has the correct extension. Follow-up to [57337]. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57338 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Improve edge case handling in `WP_Translation_Controller`. Prevents PHP warnings for possibly undefined array keys. Also fixes incorrect `@covers` annotations. Follow-up to [57337]. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57339 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Unset reference used in foreach statement. In PHP it is a good practice to unset $value if it was created by reference in a foreach loop, as the reference is still valid outside the loop, and this avoids accidental bugs. Props get_dave. Fixes #60326. git-svn-id: https://develop.svn.wordpress.org/trunk@57340 602fd350-edb4-49c9-b593-d223f7449a82 * Script Loader: Only emit CDATA wrapper comments in `wp_get_inline_script_tag()` for JavaScript. This avoids erroneously adding CDATA wrapper comments for non-JavaScript scripts, including those for JSON such as the `importmap` for script modules in #56313. Props westonruter, flixos90, mukesh27, dmsnell. See #56313. Fixes #60320. git-svn-id: https://develop.svn.wordpress.org/trunk@57341 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Add missing full stop in `WP_Comment_Query::parse_query()` DocBlock. Props hardik2221. Fixes #60323. git-svn-id: https://develop.svn.wordpress.org/trunk@57342 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Support INPUT tags. Adds support for the following HTML elements to the HTML Processor: - INPUT Previously this element was not supported and the HTML Processor would bail when encountering one. Now, with this patch applied, it will proceed to parse the HTML document. Developed in https://github.com/WordPress/wordpress-develop/pull/5907 Discussed in https://core.trac.wordpress.org/ticket/60283 Props jonsurrell See #60283 git-svn-id: https://develop.svn.wordpress.org/trunk@57343 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Improve docblocks after [57337]. Props mukesh27. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57344 602fd350-edb4-49c9-b593-d223f7449a82 * Script Loader: Load the modules to the footer in classic themes Incremental import maps fail if the import map is printed after the module scripts. This means, we should always render import maps first. This means that for classic themes, we need to move the import map and modules to the footer because we can't know before that which modules are needed. Props luisherranz, cbravobernal. Fixes #60240. git-svn-id: https://develop.svn.wordpress.org/trunk@57345 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Scan all syntax tokens in a document, read modifiable text. Since its introduction in WordPress 6.2 the HTML Tag Processor has provided a way to scan through all of the HTML tags in a document and then read and modify their attributes. In order to reliably do this, it also needed to be aware of other kinds of HTML syntax, but it didn't expose those syntax tokens to consumers of the API. In this patch the Tag Processor introduces a new scanning method and a few helper methods to read information about or from each token. Most significantly, this introduces the ability to read `#text` nodes in the document. What's new in the Tag Processor? ================================ - `next_token()` visits every distinct syntax token in a document. - `get_token_type()` indicates what kind of token it is. - `get_token_name()` returns something akin to `DOMNode.nodeName`. - `get_modifiable_text()` returns the text associated with a token. - `get_comment_type()` indicates why a token represents an HTML comment. Example usage. ============== {{{ <?php function strip_all_tags( $html ) { $text_content = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_type() ) { continue; } $text_content .= $processor->get_modifiable_text(); } return $text_content; } }}} What changes in the Tag Processor? ================================== Previously, the Tag Processor would scan the opening and closing tag of every HTML element separately. Now, however, there are special tags which it only visits once, as if those elements were void tags without a closer. These are special tags because their content contains no other HTML or markup, only non-HTML content. - SCRIPT elements contain raw text which is isolated from the rest of the HTML document and fed separately into a JavaScript engine. There are complicated rules to avoid escaping the script context in the HTML. The contents are left verbatim, and character references are not decoded. - TEXTARA and TITLE elements contain plain text which is decoded before display, e.g. transforming `&` into `&`. Any markup which resembles tags is treated as verbatim text and not a tag. - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the textarea and title elements, but no character references are decoded. For example, `&` inside a STYLE element is passed to the CSS engine as the literal string `&` and _not_ as `&`. Because it's important not treat this inner content separately from the elements containing it, the Tag Processor combines them when scanning into a single match and makes their content available as modifiable text (see below). This means that the Tag Processor will no longer visit a closing tag for any of these elements unless that tag is unexpected. {{{ <title>There is only a single token in this line</title> <title>There are two tokens in this line></title></title> </title><title>There are still two tokens in this line></title> }}} What are tokens? ================ The term "token" here is a parsing term, which means a primitive unit in HTML. There are only a few kinds of tokens in HTML: - a tag has a name, attributes, and a closing or self-closing flag. - a text node, or `#text` node contains plain text which is displayed in a browser and which is decoded before display. - a DOCTYPE declaration indicates how to parse the document. - a comment is hidden from the display on a page but present in the HTML. There are a few more kinds of tokens that the HTML Tag Processor will recognize, some of which don't exist as concepts in HTML. These mostly comprise XML syntax elements that aren't part of HTML (such as CDATA and processing instructions) and invalid HTML syntax that transforms into comments. What is a funky comment? ======================== This patch treats a specific kind of invalid comment in a special way. A closing tag with an invalid name is considered a "funky comment." In the browser these become HTML comments just like any other, but their syntax is convenient for representing a variety of bits of information in a well-defined way and which cannot be nested or recursive, given the parsing rules handling this invalid syntax. - `</1>` - `</%avatar_url>` - `</{"wp_bit": {"type": "post-author"}}>` - `</[post-author]>` - `</__( 'Save Post' );>` All of these examples become HTML comments in the browser. The content inside the funky content is easily parsable, whereby the only rule is that it starts at the `<` and continues until the nearest `>`. There can be no funky comment inside another, because that would imply having a `>` inside of one, which would actually terminate the first one. What is modifiable text? ======================== Modifiable text is similar to the `innerText` property of a DOM node. It represents the span of text for a given token which may be modified without changing the structure of the HTML document or the token. There is currently no mechanism to change the modifiable text, but this is planned to arrive in a later patch. Tags ==== Most tags have no modifiable text because they have child nodes where text nodes are found. Only the special tags mentioned above have modifiable text. {{{ <div class="post">Another day in HTML</div> └─ tag ──────────┘└─ text node ─────┘└────┴─ tag }}} {{{ <title>Is <img> > <image>?</title> │ └ modifiable text ───┘ │ "Is <img> > <image>?" └─ tag ─────────────────────────────┘ }}} Text nodes ========== Text nodes are entirely modifiable text. {{{ This HTML document has no tags. └─ modifiable text ───────────┘ }}} Comments ======== The modifiable text inside a comment is the portion of the comment that doesn't form its syntax. This applies for a number of invalid comments. {{{ <!-- this is inside a comment --> │ └─ modifiable text ──────┘ │ └─ comment token ───────────────┘ }}} {{{ <!--> This invalid comment has no modifiable text. }}} {{{ <? this is an invalid comment --> │ └─ modifiable text ────────┘ │ └─ comment token ───────────────┘ }}} {{{ <[CDATA[this is an invalid comment]]> │ └─ modifiable text ───────┘ │ └─ comment token ───────────────────┘ }}} Other token types also have modifiable text. Consult the code or tests for further information. Developed in https://github.com/WordPress/wordpress-develop/pull/5683 Discussed in https://core.trac.wordpress.org/ticket/60170 Follows [57575] Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam Fixes #60170 git-svn-id: https://develop.svn.wordpress.org/trunk@57348 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Fix typo in `_get_block_template_file()` DocBlock. Follow-up to [55744]. See #59651. git-svn-id: https://develop.svn.wordpress.org/trunk@57349 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Rename `WP_Translation_Controller::instance()` method to `get_instance()`. This improves consistency as `get_instance()` is more commonly used in core. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57350 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty-Four: Change font family slug to lowercase. Ensures referencing the correct CSS custom property. Props RavanH, poena, onemaggie, huzaifaalmesbah, mukesh27. Fixes #60325. git-svn-id: https://develop.svn.wordpress.org/trunk@57351 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix Theme.json application of custom root selector for styles. Theme.json stylesheets attempting to use a custom root selector are generated with in correct styles. Props aaronrobertshaw, get_dave, mukesh27. Fixes #60343. git-svn-id: https://develop.svn.wordpress.org/trunk@57352 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add video and audio pattern categories. More categories, better organization for patterns as they grow and power more WordPress websites. Props aaronrobertshaw, get_dave. Fixes #60342. git-svn-id: https://develop.svn.wordpress.org/trunk@57353 602fd350-edb4-49c9-b593-d223f7449a82 * Block Hooks: Introduce a new `hooked_block_{$block_type}` filter. Add a new `hooked_block_{$block_type}` filter that allows modifying a hooked block (in parsed block format) prior to insertion, while providing read access to its anchor block (in the same format). This allows block authors to e.g. set a hooked block's attributes, or its inner blocks; the filter can peruse information about the anchor block when doing so. As such, this filter provides a solution to both #59572 and #60126. The new filter is designed to strike a good balance and separation of concerns with regard to the existing [https://developer.wordpress.org/reference/hooks/hooked_block_types/ `hooked_block_types` filter], which allows addition or removal of a block to the list of hooked blocks for a given anchor block -- all of which are identified only by their block ''types''. This new filter, on the other hand, only applies to ''one'' hooked block at a time, and allows modifying the entire (parsed) hooked block; it also gives (read) access to the parsed anchor block. Props gziolo, tomjcafferkey, andrewserong, isabel_brison, timbroddin, yansern. Fixes #59572, #60126. git-svn-id: https://develop.svn.wordpress.org/trunk@57354 602fd350-edb4-49c9-b593-d223f7449a82 * Block Hooks: Amend PHPDoc for `hooked_block_{$hooked_block_type}` filter. Add missing explanation of the dynamic part of the hook name. Follow-up [57354]. Props swissspidy. See #59572, #60126. git-svn-id: https://develop.svn.wordpress.org/trunk@57355 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Fix a few typos in `wp-includes/pomo/po.php`. Props shailu25. Fixes #60346. git-svn-id: https://develop.svn.wordpress.org/trunk@57356 602fd350-edb4-49c9-b593-d223f7449a82 * Media: Redirect inactive attachment pages for logged-out users. Ensure logged out users are redirected to the media file when attachment pages are inactive. This removes the read_post capability check from the canonical redirects as anonymous users lack the permission. This was previously committed in [57310] before being reverted in [57318]. This update includes a fix to cover instances where revealing a URL could be considered a data leak and greatly expands the unit tests to ensure that this is covered along with many other instances. Follow-up to [56657], [56658], [56711], [57310], [57318]. Props peterwilsoncc, jorbin, afercia, aristath, chesio, joppuyo, jorbin, lakshmananphp, poena, sergeybiryukov, swissspidy, johnbillion. Fixes #59866. See #57913. git-svn-id: https://develop.svn.wordpress.org/trunk@57357 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Tests: Ensure set_error_handler is cleaned up. Follow up to: [57332]. Fixes #60305. git-svn-id: https://develop.svn.wordpress.org/trunk@57361 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Update third-party GitHub Actions. This updates the following third-party GitHub Actions to their latest versions: - `actions/setup-node` from `3.8.1` to `4.0.1` - `actions/upload-artifact` from `3.1.2` to `4.3.0` - `shivammathur/setup-php` from `2.28.0` to `2.29.0` - `actions/cache` from `3.3.2` to `4.0.0` - `codecov/codecov-action` from `3.1.4` to `3.1.5` Most notably, these updates silence newly encountered notices as a result of GitHub beginning to transition away from Node.js 16 to Node.js 20 (see https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/). Props swissspidy. See #59805. git-svn-id: https://develop.svn.wordpress.org/trunk@57362 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Update the `caniuse` data. This updates the `caniuse-lite` database and includes all resulting CSS and built file changes, which are all minor changes due to fluctuations in browser usage. Props gziolo, jonsurrell. See #59657. git-svn-id: https://develop.svn.wordpress.org/trunk@57363 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Add missing escaping in `Custom_Image_Header::step_2()`. Follow-up to [4673], [14907]. Props nareshbheda, audrasjb, kebbet. Fixes #59278. git-svn-id: https://develop.svn.wordpress.org/trunk@57364 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Fix some spaces on block-supports background. When we run composer format these changes are applied so I guess we should just commit them to avoid seeing the changes again the future. git-svn-id: https://develop.svn.wordpress.org/trunk@57365 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add original_source and author_text to the templates REST API. For the new "All templates" UI to work properly we need the REST API to provide to additional fields original_source, and author_text. Props ntsekouras, get_dave. Fixes #60358. git-svn-id: https://develop.svn.wordpress.org/trunk@57366 602fd350-edb4-49c9-b593-d223f7449a82 * Script Loader: Clarify in docs that `wp_get_inline_script_tag()` and `wp_print_inline_script_tag()` can take non-JS data. Props vladimiraus. Fixes #60331. git-svn-id: https://develop.svn.wordpress.org/trunk@57367 602fd350-edb4-49c9-b593-d223f7449a82 * Tests: Expand `sanitize_text_field()` tests. This change ensures that the `sanitize_text_field` and `sanitize_textarea_field` filters are correctly invoked for the respective functions. Follow-up to [38944]. Props pbearne, audrasjb. Fixes #60357. git-svn-id: https://develop.svn.wordpress.org/trunk@57368 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Add missing escaping functions to `WP_Customize_Control` and `WP_Customize_Nav_Menu_Location_Control`. Follow-up to [20295], [32806]. Props nareshbheda, shailu25, sabernhardt, audrasjb. Fixes #60324. git-svn-id: https://develop.svn.wordpress.org/trunk@57369 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Improve various globals documentation, as per docblock standards. Props upadalavipul, audrasjb, shailu25, viralsampat. Fixes #59255. See #59651. git-svn-id: https://develop.svn.wordpress.org/trunk@57370 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Typo correction in `wp_internal_hosts` docblock. Follow-up to [55289]. Props shailu25. Fixes #60363. git-svn-id: https://develop.svn.wordpress.org/trunk@57371 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Use strict type check for `in_array()` in `get_hooked_block_markup()`. This aims to prevent type juggling causing incorrect results. Follow-up to [57157]. Props jrf. See #60279. git-svn-id: https://develop.svn.wordpress.org/trunk@57372 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add registry for block binding sources It is part of the sync from the Gutenberg plugin that introduces the registry for block binding sources required for the new Block Bindings API: https://github.com/WordPress/gutenberg/issues/54536. See #60282. Props czapla, artemiosans, santosguillamot, sc0ttkclark, lgladdy, talldanwp, swissspidy, youknowriad, fabiankaegy. git-svn-id: https://develop.svn.wordpress.org/trunk@57373 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Remove unnecessary access and internal annotations from two functions in WP_REST_Templates_Controller. This commit removes unnecessary access and internal annotations from two functions that are private and as such don't require the annotation. It also adds the since annotation with the 6.5 release given that the annotation may be useful. Props swissspidy. See #60358. git-svn-id: https://develop.svn.wordpress.org/trunk@57374 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add Block Bindings API helpers It is part of the sync from the Gutenberg plugin that introduces the registry for block binding sources required for the new Block Bindings API: WordPress/gutenberg#54536. See #60282. Follow-up [57373]. Props czapla, artemiosans, santosguillamot, sc0ttkclark, lgladdy, talldanwp, swissspidy, youknowriad, fabiankaegy, mukesh27. git-svn-id: https://develop.svn.wordpress.org/trunk@57375 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Update third-party Slack action. This updates the `slackapi/slack-github-action` from `1.24.0` to `1.25.0`. This fixes more GitHub Action deprecated notices. Follow up to [57362]. See #59805. git-svn-id: https://develop.svn.wordpress.org/trunk@57376 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Update the WordPress packages to the Gutenberg 16.7 RC2 version. This patch, somewhat small brings a lot to WordPress. This includes features like: - DataViews. - Customization tools like box shadow, background size and repeat. - UI improvements in the site editor. - Preferences sharing between the post and site editors. - Unified panels and editors between post and site editors. - Improved template mode in the post editor. - Iterations to multiple interactive blocks. - Preparing the blocks and UI for pattern overrides. - and a lot more. Props luisherranz, gziolo, isabel_brison, costdev, jonsurrell, peterwilsoncc, get_dave, antonvlasenko, desrosj. See #60315. git-svn-id: https://develop.svn.wordpress.org/trunk@57377 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Update PHPCS to version 3.8.1. PHPCS has seen two new releases since the update to WPCS 3.0, with especially the 3.8.0 version containing a huge number of improvements. References: * [https://github.com/PHPCSStandards/PHP_CodeSniffer/releases/tag/3.8.0 PHP_CodeSniffer 3.8.0 release notes] * [https://github.com/PHPCSStandards/PHP_CodeSniffer/releases/tag/3.8.1 PHP_CodeSniffer 3.8.1 release notes] Follow-up to [56695]. Props jrf, swissspidy. Fixes #60279. git-svn-id: https://develop.svn.wordpress.org/trunk@57378 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Test against MySQL 8.3 Version 8.3 is the latest short-term innovation release of MySQL. See #59779. git-svn-id: https://develop.svn.wordpress.org/trunk@57379 602fd350-edb4-49c9-b593-d223f7449a82 * REST API: Support assigning terms when creating attachments. Props mukesh27, Dharm1025, Ankit K Gupta, swissspidy, dharm1025, tanjimtc71, timothyblynjacobs, spacedmonkey. Fixes #57897. git-svn-id: https://develop.svn.wordpress.org/trunk@57380 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Ensure `.l10n.php` files are deleted when upgrading language packs. Props amieiro. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57381 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Delete `.l10n.php` files when deleting a theme. Follow-up to [57337] where this was already added for plugins. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57382 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix PHP warning in Layout block support. strpos was triggering a php warning. This also updates the code to use the now supported str_contains. Props get_dave, dmsnell, ocean90, mukesh27. Fixes #60327. git-svn-id: https://develop.svn.wordpress.org/trunk@57383 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Update the minimum compatible version of Gutenberg. Previous Gutenberg versions are not compatible with recent trunk because of the WP_Navigation_Block_Renderer classname. It's present in both. Gutenberg has been updated to avoid the use of this class but we need to auto-disable old plugins to avoid fatals. Props hellofromtonya. See #60315. git-svn-id: https://develop.svn.wordpress.org/trunk@57384 602fd350-edb4-49c9-b593-d223f7449a82 * Tests: Remove redundant unregister call in block bindings tear down Only block bindings sources registered in the tests should get unregistered. Follow-up for [57375]. See #60282. Props czapla. git-svn-id: https://develop.svn.wordpress.org/trunk@57385 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Improve singular lookup of pluralized strings. Ensures that looking up a singular that is also used as a pluralized string works as expected. This improves compatibility for cases where for example both `__( 'Product' )` and `_n( 'Product', 'Products’, num )` are used in a project, where both will use the same translation for the singular version. Although such usage is not really recommended nor documented, it must continue to work in the new i18n library in order to maintain backward compatibility and maintain expected behavior. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57386 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Add missing space after `foreach` keyword. Follow-up to [57386]. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57387 602fd350-edb4-49c9-b593-d223f7449a82 * Uploads: Check for and verify ZIP archives. Props costdev, peterwilsoncc, azaozz, tykoted, johnbillion, desrosj, afragen, jorbin. git-svn-id: https://develop.svn.wordpress.org/trunk@57388 602fd350-edb4-49c9-b593-d223f7449a82 * Install: When populating options, maybe_serialize instead of always serialize. Props xknown, peterwilsoncc, jorbin, desrosj. git-svn-id: https://develop.svn.wordpress.org/trunk@57389 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Fix splitting single text node. When `next_token()` was introduced, it brought a subtle bug. When encountering a `<` in the HTML stream which did not lead to a tag or comment or other token, it was treating the full text span to that point as one text node, and the following span another text node. The entire span should be one text node. In this patch the Tag Processor properly detects this scenario and combines the spans into one text node. Follow-up to [57348] Props jonsurrell Fixes #60385 git-svn-id: https://develop.svn.wordpress.org/trunk@57489 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: reduce specificity of block style variation selector. Removes duplicate classname from the block style variation selector generated in `WP_Theme_JSON`’s `get_blocks_metadata` function. Props flixos90, joemcgill, mukesh27, isabel_brison. Fixes #60312. git-svn-id: https://develop.svn.wordpress.org/trunk@57490 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: introduce `dimensions.aspectRatio` block support. Adds front end rendering logic for the `dimensions.aspectRatio` block support as well as the required logic in `WP_Theme_JSON` and the style engine. Props andrewserong. Fixes #60365. git-svn-id: https://develop.svn.wordpress.org/trunk@57491 602fd350-edb4-49c9-b593-d223f7449a82 * Script Modules API: Add import map polyfill for older browsers Syncs the changes from https://github.com/WordPress/gutenberg/pull/58263. Adds a polyfill to make import maps compatible with unsported browsers (https://caniuse.com/import-maps). Fixes #60348. Props cbravobernal, jorbin, luisherranz, jonsurrell. git-svn-id: https://develop.svn.wordpress.org/trunk@57492 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add `viewStyle` property to `block.json` for frontend-only block styles Related issue in Gutenberg: https://github.com/WordPress/gutenberg/issues/54491. For block scripts there was already `script`, `viewScript` and `editorScript`. For block styles there was only `style` and `editorStyle`. This brings the parity. Props gaambo. Fixes #59673. git-svn-id: https://develop.svn.wordpress.org/trunk@57493 602fd350-edb4-49c9-b593-d223f7449a82 * REST API: Add route for single styles revisions. Adds a route for single global styles revisions: /wp/v2/global-styles/${ parentId }/revisions/${ revisionsId } This fixes the `getRevision` actions in the core-data package. Props ramonopoly, get_dave. Fixes #59810. git-svn-id: https://develop.svn.wordpress.org/trunk@57494 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Eleven: Fix typo in `twentyeleven_widgets_init()` description. Follow-up to [17738]. Props harshgajipara. See #60383. git-svn-id: https://develop.svn.wordpress.org/trunk@57495 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Sanitize nested array in theme.json properly. WP_Theme_JSON sanitization is now able to sanitize data contained on indexed arrays. So certain data from theme.json, for example, settings.typography.fontFamilies which is a JSON array will be sanitized. Props mmaattiiaass, mukesh27. Fixes #60360. git-svn-id: https://develop.svn.wordpress.org/trunk@57496 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix Theme.json font settings in unit test. These changes fix incorrect font settings when testing the generation of a theme.json stylesheet. Props aaronrobertshaw, mukesh27. Fixes #60341. git-svn-id: https://develop.svn.wordpress.org/trunk@57497 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix Theme.json font settings unit test. This file has been ommitted from the previous commit [57497]. See #60341. git-svn-id: https://develop.svn.wordpress.org/trunk@57498 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Update WordPress packages to Gutenberg 16.7 RC3. It brings with a set of iterations and follow-ups to the initial package update. It also fixes a regression that happened for interactive blocks. Props gziolo, luisherranz, cbravobernal. See #60315. git-svn-id: https://develop.svn.wordpress.org/trunk@57499 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: fix small typos in block bindings API docblocks. Props shailu25. See #60282. Fixes #60386. git-svn-id: https://develop.svn.wordpress.org/trunk@57500 602fd350-edb4-49c9-b593-d223f7449a82 * HTTP API: Ensure cookie names are cast to strings. Props nosilver4u, darssen, kraftbj, engahmeds3ed, barry.hughes, schlessera. Fixes #58566. git-svn-id: https://develop.svn.wordpress.org/trunk@57501 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Twenty-Three: Rename Comments template part. This renames the Comments template part to 'Comments Template Part', to reduce confusion with the 'Comments' block when viewing both in the inserter. Props mikachan, mukesh27, poena. Fixes #56999. git-svn-id: https://develop.svn.wordpress.org/trunk@57502 602fd350-edb4-49c9-b593-d223f7449a82 * Script Loader: Use a global variable in `wp_script_modules()`. This brings the function more in line with its related `wp_scripts()` and `wp_styles()` functions and makes it easier to reset the class instance in tests. Props westonruter, luisherranz. See #56313. git-svn-id: https://develop.svn.wordpress.org/trunk@57503 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Load new translation library in `wp_load_translations_early()`. Ensures localization continues to work as expected with the new library in case translations need to be loaded early in the process. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57504 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Revert [57386] pending further investigation. Reverts the change for fallback string lookup due to a performance regression in the bad case scenario. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57505 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Fix CDATA lookalike matching invalid CDATA When `next_token()` was introduced to the HTML Tag Processor, it started classifying comments that look like they were intended to be CDATA sections. In one of the changes made during development, however, a typo slipped through code review that treated comments as CDATA even if they only ended in `]>` and not the required `]]>`. The consequences of this defect were minor because in all cases these are treated as HTML comments from invalid syntax, but this patch adds the missing check to ensure the proper reporting of CDATA-lookalikes. Follow-up to [57348] Props jonsurrell Fixes #60406 git-svn-id: https://develop.svn.wordpress.org/trunk@57506 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Fix void tag nesting with next_token When `next_token()` was introduced, it introduced a regression in the HTML Processor whereby void tags remain on the stack of open elements when they shouldn't. This led to invalid values returned from `get_breadcrumbs()`. The reason was that calling `next_token()` works through a different code path than the HTML Processor runs everything else. To solve this, its sub-classed `next_token()` called `step( self::REPROCESS_CURRENT_TOKEN )` so that the proper HTML accounting takes place. Unfortunately that same reprocessing code path skipped the step whereby void and self-closing elements are popped from the stack of open elements. In this patch, that step is run with a third mode for `step()`, which is the new `self::PROCESS_CURRENT_TOKEN`. This mode acts as if `self::PROCESS_NEXT_NODE` were called, except it doesn't advance the parser. Developed in https://github.com/WordPress/wordpress-develop/pull/5975 Discussed in https://core.trac.wordpress.org/ticket/60382 Follow-up to [57348] Props dmsnell, jonsurrell Fixes #60382 git-svn-id: https://develop.svn.wordpress.org/trunk@57507 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Test cleanup Rename `$p` variable to `$processor` in tests for clarity. Use static data providers. A mix of static and non-static data providers were used in HTML API tests. Data providers are required to be static in the next PHPUnit version and there's no harm in using them consistently now. Follow-up to [57507] Props jonsurrell See #59647 git-svn-id: https://develop.svn.wordpress.org/trunk@57508 602fd350-edb4-49c9-b593-d223f7449a82 * Docs: Fix typo in `do_robots()` docblock. This was introduced in [45928]. Props shailu25, mukesh27. Fixes #60405. git-svn-id: https://develop.svn.wordpress.org/trunk@57509 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Remove shadow support via direct attribute. Shadow block support should always rely on the style attribute instead. Props madhudollu. Fixes #60377. git-svn-id: https://develop.svn.wordpress.org/trunk@57510 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add deprecated functions from interactivity core blocks. In 6.5 we are removing a couple of functions in Core blocks that were enqueuing the files needed to add that interactivity. Interactivity is handled with modules, so those functions are not needed anymore and are deprecated. Props swissspidy, cbravobernal. Fixes #60380. git-svn-id: https://develop.svn.wordpress.org/trunk@57511 602fd350-edb4-49c9-b593-d223f7449a82 * Twenty Fifteen: Fix typo in `css/blocks.css`. Follow-up to [43798]. Props shailu25, harshgajipara. Fixes #60383. git-svn-id: https://develop.svn.wordpress.org/trunk@57512 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Improve singular lookup of pluralized strings. Ensures that string lookup in MO files only uses the singular string. This matches expected behavior with gettext files and improves compatibility for cases where for example both `__( 'Product' )` and `_n( 'Product', 'Products’, num )` are used in a project, where both will use the same translation for the singular version. Maintains backward compatibility and feature parity with the pomo library and the PHP translation file format. Replaces [57386], which was reverted in [57505], with a more accurate and performant solution. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57513 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add the Block Bindings API. This introduces the Block Bindings API for WordPress. The API allows developers to connects block attributes to different sources. In this PR, two such sources are included: "post meta" and "pattern". Attributes connected to sources can have their HTML replaced by values coming from the source in a way defined by the binding. Props czapla, lgladdy, gziolo, sc0ttkclark, swissspidy, artemiosans, kevin940726, fabiankaegy, santosguillamot, talldanwp, wildworks. Fixes #60282. git-svn-id: https://develop.svn.wordpress.org/trunk@57514 602fd350-edb4-49c9-b593-d223f7449a82 * Media: Prevent local edits during media upload. Prevent `options.allowLocalEdits` from toggling to true during the upload cycle. Otherwise, media meta fields can be edited, but the data will be lost as soon as the upload process is completed. Props codepo8, oglekler, nicolefurlan, antpb, syamraj24, joedolson. Fixes #58783, #23374. git-svn-id: https://develop.svn.wordpress.org/trunk@57515 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Support loading `.l10n.php` translation files on their own. Adjusts the translation file lookup in `WP_Textdomain_Registry` so that just-in-time translation loading works even if there is only a `.l10n.php` translation file without a corresponding `.mo` file. While language packs continue to contain both file types, this makes it easier to use translations in a project without having to deal with `.mo` or `.po` files. Props Chrystl. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57516 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Introduce Props Bot workflow. Props Bot is a new GitHub Action that will compile a list of contributors for a given pull request. The bot will leave a comment with a list of contributors formatted for use in both Trac SVN and GitHub. Props dharm1025, desrosj, jorbin, jeffpaul, dd32, pento, gziolo, swissspidy, talldanwp, noisysocks, youknowriad, peterwilsoncc, joemcgill, chrisdavidmiles, wpscholar, annezazu, chanthaboune, desrosjbot. See #60417. git-svn-id: https://develop.svn.wordpress.org/trunk@57517 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Fix plural forms parsing in `WP_Translation_File`. Ensures the plural expression from the translation file header is correctly parsed. Prevents silent failures in the attempt to create the plural form function. Adds additional tests. Props Chouby. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57518 602fd350-edb4-49c9-b593-d223f7449a82 * I18N: Add type declaration to new method missed in [57518]. See #59656. git-svn-id: https://develop.svn.wordpress.org/trunk@57519 602fd350-edb4-49c9-b593-d223f7449a82 * Administration: Accessibility: Use the default cursor style for labels and disabled form controls. The native cursor style for labels and form controls is `default`, which is the platform-dependent default cursor. Typically an arrow. Historically, WordPress always used the `pointer` style for all form controls and labels. While this isn't standard, there is some value in using the `pointer` style for form controls. However, labels should use the default style especially when the associated controls are disabled. Additionally, makes sure the disabled styling works for form controls with an `aria-disabled="true"` attribute. Props joedolson, afercia. Fixes #59733. git-svn-id: https://develop.svn.wordpress.org/trunk@57520 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Add `allowed_blocks` field to block registration and REST API There is a new block.json field called allowedBlocks, added in Gutenberg in https://github.com/WordPress/gutenberg/pull/58262. This adds support for this new field also on the server. Props: gziolo, jsnajdr. Fixes #60403. git-svn-id: https://develop.svn.wordpress.org/trunk@57521 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Use strict comparison for functions lookup in plugin/theme editors. Follow-up to [10607], [44617]. Props upadalavipul. See #60415. git-svn-id: https://develop.svn.wordpress.org/trunk@57522 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Some improvements to the Props Bot workflow. This makes a few improvements made to the Props Bot workflow: - The bot will no longer run on draft PRs. - The bot will no longer run on closed PRs. - The bot will no longer run when a comment is deleted (this should almost never happen). Props mamaduka, gziolo. See #60417. git-svn-id: https://develop.svn.wordpress.org/trunk@57523 602fd350-edb4-49c9-b593-d223f7449a82 * Media: enable AVIF support. Add support for uploading, editing and saving AVIF images when supported by the server. Add 'image/avif' to supported mime types. Correctly identify AVIF images and sizes even when PHP doesn't support AVIF. Resize uploaded AVIF files (when supported) and use for front end markup. Props adamsilverstein, lukefiretoss, ayeshrajans, navjotjsingh, Tyrannous, jb510, gregbenz, nickpagz, JavierCasares, mukesh27, yguyon, swissspidy. Fixes #51228. git-svn-id: https://develop.svn.wordpress.org/trunk@57524 602fd350-edb4-49c9-b593-d223f7449a82 * Media: fix AVIF tests. Follow up to r57524. Properly add AVIF images for unit tests. Fixes #51228. git-svn-id: https://develop.svn.wordpress.org/trunk@57525 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Refactor the way block bindings sources are handled It fixes the coding style issues reported. It goes further and improves the code quality it other places where the logic for block bindings was added. Follow-up for [57514]. Props: gziolo, mukesh27, youknowriad, santosguillamot. See #60282. git-svn-id: https://develop.svn.wordpress.org/trunk@57526 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Reset parser state after seeking to bookmark. When parser states were introduced, nothing in the `seek()` method reset the parser state. This is problematic because it could leave the parser in the wrong state. In this patch the parser state is reset so that it's properly adjusted on the successive call to `next_token()`. Developed in https://github.com/WordPress/wordpress-develop/pull/6021 Discussed in https://core.trac.wordpress.org/ticket/60428 Follow-up to [57211] Props dmsnell, kevin940726 Fixes #60428 git-svn-id: https://develop.svn.wordpress.org/trunk@57527 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Fix typo setting the wrong self-closing flag. The HTML Processor tracks whether a token was found with the self-closing flag. Depending on the context, this flag may or may not indicate that the element is self closing. Unfortunately it's been tracking the wrong flag: it's been tracking the end-tag flag, which indicates that a token is an end tag. In this patch the right flag is set in the HTML Processor. This hasn't been an issue because the HTML Processor doesn't yet read that stored flag, but it's an important fix to make before adding support for foreign content (SVG and MathML) since that behavior depends on reading the correct flag. Follow-up to [56274]. Props dmsnell. git-svn-id: https://develop.svn.wordpress.org/trunk@57528 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Use strict comparison in `wp-admin/update-core.php`. Follow-up to [11273], [25784], [54654]. Props wpfy, mukesh27, azaozz, viralsampat. Fixes #58061, #60415. git-svn-id: https://develop.svn.wordpress.org/trunk@57529 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Rename the `$ID` parameter to `$post_id` in `trackback()`. This resolves a few WPCS warnings: {{{ Variable "$ID" is not in valid snake_case format, try "$i_d" }}} See #59650. git-svn-id: https://develop.svn.wordpress.org/trunk@57530 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Mock plugin API response in `WP_REST_Plugins_Controller_Test`. Avoid false test failures due to network conditions in the `WP_REST_Plugins_Controller_Test` class. This mocks HTTP responses from the plugin information endpoint for the link-manager plugin. Props: peterwilsoncc, costdev. See #59647. git-svn-id: https://develop.svn.wordpress.org/trunk@57531 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Rename the `$expires_offset` variable in `cache_javascript_headers()`. This resolves a WPCS warning: {{{ Variable "$expiresOffset" is not in valid snake_case format, try "$expires_offset" }}} Follow-up to [4109], [21996]. See #59650. git-svn-id: https://develop.svn.wordpress.org/trunk@57532 602fd350-edb4-49c9-b593-d223f7449a82 * Script Loader: Remove unused `WP_Scripts::get_unaliased_deps()` method. This private method was introduced in [56033] / #12009 but it's not actually used. It was part of the inline script implementation which was later reverted before final merge. The method can be safely removed because it’s private and cannot be used by extenders. Props joemcgill. Fixes #60438. git-svn-id: https://develop.svn.wordpress.org/trunk@57533 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Update the `codecov/codecov-action` action. This updates the `codecov/codecov-action` from version `3.1.5` to `4.0.1`. Version 4 switches to using the Codecov CLI to upload test report date, and changes the version of Node.js used for the action to 20.x. This fixes the notices currently shown for the test coverage workflow. Props: mukesh27. See #59658. git-svn-id: https://develop.svn.wordpress.org/trunk@57534 602fd350-edb4-49c9-b593-d223f7449a82 * General: Add tests for `array_is_list` polyfill added in r57337. Props costdev. See #55105. git-svn-id: https://develop.svn.wordpress.org/trunk@57535 602fd350-edb4-49c9-b593-d223f7449a82 * Build/Test Tools: Pass a token to the Codecov action. Version 4 of the action now requires a token to be provided in order to upload coverage results. Follow up to [57534]. Props swissspidy. See #59658. git-svn-id: https://develop.svn.wordpress.org/trunk@57536 602fd350-edb4-49c9-b593-d223f7449a82 * Upload: Fallback to `PclZip` to validate ZIP file uploads. `ZipArchive` can fail to validate ZIP files correctly and report valid files as invalid. This introduces a fallback to `PclZip` to check validity of files if `ZipArchive` fails them. This introduces the new function `wp_zip_file_is_valid()` to validate archives. Follow up to [57388]. Props audunmb, azaozz, britner, cdevroe, colorful-tones, costdev, courane01, endymion00, feastdesignco, halounsbury, jeffpaul, johnbillion, jorbin, jsandtro, karinclimber, kevincoleman, koesper, maartenbelmans, mathewemoore, melcarthus, mujuonly, nerdpressteam, olegfuture, otto42, peterwilsoncc, room34, sayful, schutzsmith, stephencronin, svitlana41319, swissspidy, tnolte, tobiasbg, vikram6, welaunchio. Fixes #60398. git-svn-id: https://develop.svn.wordpress.org/trunk@57537 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Rename the `$oSelf` variable in `WP_MatchesMapRegex::apply()`. This resolves a WPCS warning: {{{ Variable "$oSelf" is not in valid snake_case format, try "$o_self" }}} Follow-up to [11853], [38376]. See #59650. git-svn-id: https://develop.svn.wordpress.org/trunk@57538 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Introduce the Font Library post types and low level APIs. This is the first step towards adding the font library to WordPress. This commit includes the font library and font face CPTs. It also adds the necessary APIs and classes to register and manipulate font collections. This PR backports the font library post types and low level APIs to Core. This is the first step to include the font library entirely into Core. Once this merged, we'll open a PR with the necessary REST API controllers. Props youknowriad, get_dave, grantmkin, swissspidy, hellofromtonya, mukesh27, mcsf. See #59166. git-svn-id: https://develop.svn.wordpress.org/trunk@57539 602fd350-edb4-49c9-b593-d223f7449a82 * Editor: Fix Font Library PHP unit tests. These font assets files used in phpunit tests were missing in the original commit [57539]. Props mukesh27. See #59166. git-svn-id: https://develop.svn.wordpress.org/trunk@57540 602fd350-edb4-49c9-b593-d223f7449a82 * Coding Standards: Fix array key alignment after [57539]. See #59166. git-svn-id: https://develop.svn.wordpress.org/trunk@57541 602fd350-edb4-49c9-b593-d223f7449a82 * HTML API: Join text nodes on invalid-tag-name boundaries. A fix was introduced to the Tag Processor to ensure that contiguous text in an HTML document emerges as a single text node spanning the full sequence. Unfortunately, that patch was marginally over-zealous in checking if a "<" started a syntax token or not. It used the following: {{{ <?php if ( 'A' <= $c && 'z' >= $c ) { ... } }}} This was based on the assumption that the A-Z and a-z letters are contiguous in the ASCII range; they aren't, and there's a gap of several characters in between. The result of this is that in some cases the parser created a text boundary when it didn't need to. Text boundaries can be surprising and can be created when reaching invalid syntax, HTML comments, and more hidden elements, so semantically this wasn't a major bug, but it was an aesthetic challenge. In this patch the check is properly compared for both upper- and lower-case variants that could potentially form tag names. {{{ <?php if ( ( 'A' <= $c && 'Z' >= $c ) || ( 'a' <= $c && 'z' >= $c ) ) { ... } }}} This solves the problem and ensures that contiguous text appears as a single text node when scanning tokens. Developed in https://github.com/WordPress/wordpress-develop/pull/6041 Discussed in https://core.trac.wordpress.org/ticket/60…
Trac ticket: Core-60170
Companion port into Gutenberg: WordPress/gutenberg#58107 (contains additional porting code)
This PR provides full tokenization scanning of an HTML document. This is being added into the Tag Processor and will be a necessary component for a number of related changes to the HTML API:
Enables syntax-aware processing such as
wp_truncate_html()
[gist]Replaces/incorporates chunked/extended processing in #5050
Replaces/incorporates stopping at comments in dmsnell#7
Provides critical functionality for inner/outer getter/setter in [dmsnell#10, #4965]
Depends on #5721 ✅
Depends on #5725 ✅
Todo
$this->bytes_already_parsed
assignments and make sure they are proper. I think half of them are one off.MATCHED_TAG | TEXT_NODE
andINCOMPLETE | COMPLETE
, which could simplify some logic that's spread inif
statements.<!--->
.>
, not the closing]]>
or the closing?>
. So we can find all HTML comments, and then determine if they would have been a CDATA or PI Node if HTML supported those.<?for-each?>
from<--for-each-->
.Design Changes
In this change we're introducing two features stemming from two internal changes:
next_token()
provides the ability to scan every token in the HTML stream.The internal changes powering this are:
For example, when encountering an HTML comment the parser will track the following token information:
Not every token will have a text region, but it's important to track the entire token and any text region because similar tokens may have different syntax. For example, an invalid comment is still a comment.
This holds for tokens whose entire content is text, such as with the
#text
node.Special HTML tags have modifiable text and that isn't part of
.textConent
or.innerText
. For example, theTITLE
element contains no HTML inside of it and everything is plaintext and its contents don't appear in the page. The same is true forTEXTAREA
andSCRIPT
andSTYLE
and a few more elements.Scanning tokens
In order to keep the
next_tag()
interface and use clear, it is left unchanged. For operations needing access to the token stream, there is no built-in query mechanism and querying ought to be performed inside anext_token()
loop.get_token_type()
indicates what kind of token is currently matched,get_token_name()
returns something that more closely matches what a DOM API would return, andget_modifiable_text()
returns the modifiable text if available.TODO
next_token()
method to scan each token.SCRIPT
,STYLE
,TITLE
,TEXTAREA
, etc…SCRIPT
tags and other tags with special closing rules. These are currently handled by skipping to the end of the element when finding the starting tag, but this has introduced a few challenges and bugs (for example, the Tag Processor fails to stop at a<title>
tag if the document ends before the</title>
closer is found).rewind()
method to reverse to the start of the document.