-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid double-processing post content when parsing block template HTML #3549
base: trunk
Are you sure you want to change the base?
Avoid double-processing post content when parsing block template HTML #3549
Conversation
$content = wptexturize( $content ); | ||
$content = convert_smilies( $content ); | ||
$content = shortcode_unautop( $content ); | ||
$content = wp_filter_content_tags( $content ); | ||
$content = wp_filter_content_tags( $content, 'the_block_template' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is a bit unrelated, but since here this function is called outside of any filter, it is crucial to manually provide a $context
. I went with the_block_template
for now (similar to e.g. the_content
), but certainly open to alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this in another commit. No need to block this change as of this ticket.
I can see three major flaws with this implementation:
There is an alternative implementation at WordPress/gutenberg#44995 that avoids all of the above and is much cleaner. |
@carlomanf Thanks for sharing these. Some replies:
To be clear, I'm not saying the last 2 things are impossible to happen, but they are extreme edge cases to the degree that makes them irrelevant (<0.1% of sites). |
@carlomanf I have pushed additional commits which address your points 1. and 2. While I don't think 2. is a significant concern, it's still trivial to address so I think it's worth doing. The fix to point 1. led me to another neat code cleanup here, as I've now replaced the global with a simple local variable that cannot be tampered with outside of the intended usage (when parsing the |
Points 2 and 3 that I raised can also be combined, so one of the shortcode callbacks could add Why not use a string that is generated uniquely at runtime, such as with Also, this solution does not stop double filtering when one post content is embedded inside another post content, which is a far more likely occurrence than the other scenarios and is handled easily by WordPress/gutenberg#44995. How feasible do you think it would be to modify this solution to handle it? |
Providing a quick update here, I've found another simpler fix for the lazy-loading problem from https://core.trac.wordpress.org/ticket/56930 specifically and opened #3560. Therefore I've removed one unit test from here which is now part of that PR. This PR here remains valid for fixing https://core.trac.wordpress.org/ticket/55996. |
My previous comment remains, if you want to break this feature, you can. Let's be realistic here, what are the chances that there is a shortcode in an FSE template that wraps a
We can potentially combine this with the current approach. Using
What is a realistic use-case to include a |
Imagine there is a shortcode that wraps some content, performs a word count on it, and prints out the word count appended to the raw content. Currently, if a post content block is added within the shortcode, the post content is included in the word count. After this patch is applied, the shortcode will begin to receive the placeholder and the post content will no longer be included in the word count. I don't know of a particular shortcode that does the above or something like it, but I wouldn't underestimate the use of legacy features and I wouldn't at all be surprised if the bug gets reported at some point. For example, WordPress/gutenberg#35676 is a separate but similar issue that shows how widely plugin/theme authors are using shortcodes together with blocks and discovering unexpected behaviour. The issue with the hypothetical word count shortcode would actually be worse, because it would not be the unexpected behaviour of a new feature, but an unexpected change in behaviour after several release cycles given that shortcode support was added in 5.9. The root cause of all of these bugs is that the block paradigm that was ostensibly intended to be a clean slate ended up receiving support for legacy features, some of which date back to the b2/cafelog days, and neither was ever designed to be compatible with the other. There is probably no way forward here that isn't going to break at least something, and quoting obviously fabricated statistics like "<0.1% of sites" is not going to be helpful, especially not after the bug reports inevitably roll in. The main difference between WordPress/gutenberg#44995 and this patch is as follows:
|
@carlomanf I agree with you in that there is a risk of potential breakage in this change. I do however still think that the risk for what you're pointing out to come up as a real-world bug is extremely low. It obviously does not make sense to argue about this because we can't know until it's out there. My take here specifically is that the chance of this causing a bug is so small that we can add it to core, basically "wait and see". If my assessment here shows to be wrong, and we get bug reports of this, I'll happily revert and then we have to look for alternatives. We also have beta period for that purpose (though even if this is in a stable release, we could revert it if it became a problem). The implementation from WordPress/gutenberg#44995 has other problems, specifically for the |
$pos = strpos( $content, $post_content ); | ||
if ( false !== $pos ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use str_contains
here?
// Ensure a placeholder is used that is not already present in the content. | ||
$content_placeholder = '<!-- wp-post-content-placeholder-%d -->'; | ||
$existing_placeholder_count = 0; | ||
while ( preg_match( '/' . sprintf( $content_placeholder, '[0-9]+' ) . '/', $content ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love using a while loop and regex ( preg_match ) here. Is there no way we could do a single regex and loop through all the matching blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it should be possible with preg_match_all()
.
$content = str_replace( | ||
array_map( | ||
function( $i ) use ( $content_placeholder ) { | ||
return sprintf( $content_placeholder, $i ); | ||
}, | ||
array_keys( $parsed_post_content ) | ||
), | ||
$parsed_post_content, | ||
$content | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not say this is specially readable. Any chance we could just use a simple for each loop here? It would mean not requiring this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea. Also, while not huge in real terms, using foreach()
will be ~15-30% faster.
foreach ( $parsed_post_content as $item_number => $replacement ) {
$content = str_replace(
sprintf( $content_placeholder, $item_number ),
$replacement,
$content
);
}
}, | ||
PHP_INT_MAX | ||
); | ||
|
||
$content = $wp_embed->run_shortcode( $_wp_current_template_content ); | ||
$content = $wp_embed->autoembed( $content ); | ||
$content = do_blocks( $content ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this?
remove_filter( 'the_excerpt', 'wp_filter_content_tags' );
remove_filter( 'the_content', 'wp_filter_content_tags' );
$content = do_blocks( $content );
add_filter( 'the_excerpt', 'wp_filter_content_tags' );
add_filter( 'the_content', 'wp_filter_content_tags' );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this?
remove_filter( 'the_excerpt', 'wp_filter_content_tags' ); remove_filter( 'the_content', 'wp_filter_content_tags' ); $content = do_blocks( $content ); add_filter( 'the_excerpt', 'wp_filter_content_tags' ); add_filter( 'the_content', 'wp_filter_content_tags' );
WordPress/gutenberg#44995 does exactly what you are suggesting. @felixarntz informed me that it does not actually fix the problem, and nor does your similar implementation #3833.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He did not seem at all interested in assisting with a solution based on filter removal, even after I informed him that his implementation would break shortcodes. I just don't have the familiarity with wp_filter_content_tags
to know how to implement it correctly, and all I can tell is that the $context
variable needs to be set to 'the_content'
for it to work as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in my PR, even after applying this PR, filters are applied double, as image filter is applied in template_part funciton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not talking about the same thing. It's not because of the template part, it's because there are conditions that require the the $context
variable to be set to 'the_content'
rather than 'the_block_template_html'
, and I'm not sure of the ramifications of changing the condition to allow a different value or explicitly passing 'the_content'
to make the condition pass. If @felixarntz is willing to assist in this area then we might be able to use WordPress/gutenberg#44995 as the solution instead of putting up with his messy implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this?
@spacedmonkey Removing the filters around the do_blocks()
call is only a partial solution. This "record before, reinstate after" approach is the only one that solves both "double execution problem" and also the "avoid all execution problem" as discussed at length in 55996.
If I need to open another ticket on Trac so that the "avoid all execution problem" is considered of equal importance to the "double execution problem" then I guess I'll do that :) In the meantime, this approach is the one that works for both problems.
// Replace all post content within the template with a temporary placeholder so that the post content is not | ||
// unnecessarily processed again. See https://core.trac.wordpress.org/ticket/55996. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Replace all post content within the template with a temporary placeholder so that the post content is not | |
// unnecessarily processed again. See https://core.trac.wordpress.org/ticket/55996. | |
/* | |
* Replace all post content within the template with a temporary placeholder so that the post content is not | |
* unnecessarily processed again. See https://core.trac.wordpress.org/ticket/55996. | |
*/ |
Should use the multiline comment format here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @felixarntz and was glad to see it resolved the issue with shortcodes reported in another ticket. Nice work!
I've left some thoughts above/below.
@@ -226,4 +264,109 @@ public function test_resolve_home_block_template_no_resolution() { | |||
|
|||
$this->assertNull( $template ); | |||
} | |||
|
|||
/** | |||
* @ticket 55996 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have test descriptions beginning with "Tests" per convention, and covers annotations?
/**
* Tests that X does/does not Y when Z
*
* @ticket 55996
*
* @covers
*/
Same for other tests in the PR as well.
$this->assertStringContainsString( '<img data-attr="value" ', $html ); | ||
$this->assertStringNotContainsString( '<img data-attr="value" data-attr="value" ', $html ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test methods with multiple assertions should have a final $message
parameter to explain what went wrong should an assertion fail. Same for other tests in the PR as well.
} | ||
|
||
public static function wpTearDownAfterClass() { | ||
wp_delete_post( self::$post_id, true ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the parent method deletes all posts. Curious, do other tests in the test suite fail without this line?
// Ensure a placeholder is used that is not already present in the content. | ||
$content_placeholder = '<!-- wp-post-content-placeholder-%d -->'; | ||
$existing_placeholder_count = 0; | ||
while ( preg_match( '/' . sprintf( $content_placeholder, '[0-9]+' ) . '/', $content ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it should be possible with preg_match_all()
.
It is possible, encouraged, and mentioned in this video at 5–6 minutes, 18 minutes and 32 minutes. It is also mentioned in this article and this article. Handling post content within post content is essential for this issue to be solved, and currently, this patch does nothing to handle it. |
get_the_block_template_html()
calls various (partially expensive) functions to parse the block template content.post-content
block, which itself parses post content using similar functions (which has historically been the established behavior).wp_content_img_tag
filter to make a modification, you now have to consider that this filter may run twice on the same piece of content).Trac ticket: https://core.trac.wordpress.org/ticket/55996
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.