Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional callback argument to serialize_block(s) #5187

Closed

Conversation

ockham
Copy link
Contributor

@ockham ockham commented Sep 11, 2023

For some operations, we need to traverse the parsed block tree (before it is eventually serialized). In order to minimize the number of tree traversals (which is a potentially costly operation), it seems that adding a callback function argument to serialize_block() -- which inevitably has to traverse the entire tree anyway -- is a natural fit.

Examples where this could come in handy:

Spun off from #5158.

Trac ticket: https://core.trac.wordpress.org/ticket/59327


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@ockham ockham self-assigned this Sep 11, 2023
@ockham ockham requested a review from gziolo September 11, 2023 16:53
@ockham ockham marked this pull request as ready for review September 11, 2023 16:56
@dlh01
Copy link

dlh01 commented Sep 12, 2023

Hi! Please forgive my dropping in with a few comments, but this PR was interesting to me.

As a regular reader of the block API, I found the $callback parameter somewhat unclear. In the context #5158, of course, it makes more sense. Otherwise, $callback seems just as likely to apply at the end of the function as to the beginning. Perhaps it acts as replacement logic for the whole function. A longer name like $tree_callback or something might help disambiguate.

Next, after thinking about the variable name, I wondered why this function would introduce a dedicated parameter for what most other functions would make accessible as a filter — pre_serialize_block or something. My instinct as a developer at an agency would be that if core has a need to modify the behavior of serialize_block(), doing so with a filter would be more in keeping with the extensibility of the project and preserve the ability for developers to integrate with core.

Still, I can think of some reasons against a filter, such as being wary about adding a filter to a hot code path, a desire to keep the new behavior in #5158 "private," or disliking the pattern of adding and removing one-off filters. So that leads to my final thought, which is, rather than add a callback to serialize_block(), create a separate function for traversing the block tree, and call that before serialize_blocks(), e.g.

$pattern            = $this->registered_patterns[ $pattern_name ];
$blocks             = parse_blocks( $pattern['content'] );
$visitor            = _parsed_block_visitor( $pattern );
$blocks             = map_block_tree( $visitor, $blocks ); // Here
$pattern['content'] = serialize_blocks( $blocks );

This new function could be @access private if you thought that was important, although I can think of some cases where we could have used such a function with our clients.

Regardless, a new function would keep an added responsibility out of serialize_block(s)() and still allow you to modify the block tree without interference when Block Hooks required doing so.

Thanks for considering it!

src/wp-includes/blocks.php Outdated Show resolved Hide resolved
@gziolo
Copy link
Member

gziolo commented Sep 12, 2023

@dlh01, you shared great feedback in your comment about $callback. It's a difficult topic as serialize_blocks needs to be extremely performant.

This new function could be @access private if you thought that was important, although I can think of some cases where we could have used such a function with our clients.

Regardless, a new function would keep an added responsibility out of serialize_block(s)() and still allow you to modify the block tree without interference when Block Hooks required doing so.

This would work, but that would require traversing the tree of blocks twice. You would essentially have to recreate recursion inside map_block_tree. What's interesting, a very similar helper is already present in the code: _inject_theme_attribute_in_block_template_content.

My instinct as a developer at an agency would be that if core has a need to modify the behavior of serialize_block(), doing so with a filter would be more in keeping with the extensibility of the project and preserve the ability for developers to integrate with core.

Yes, that would definitely be my favorite approach. I'm very hesitant to go this path though, as it's hard to predict the consequences. In this case, it's safer to start small and allow modifications only when loading the templates and patterns from the files on the server, as we know it will get processed only once. In fact, Block Hooks feature will most likely include a filter as part of the flow so plugin devs can cover more advanced use cases where they have specific limitations for automatically inserting blocks.

Anyway, let's keep the discussion going as we implement the basic functionality to enable the feature for WP 6.4 beta 1.

@ockham
Copy link
Contributor Author

ockham commented Sep 12, 2023

Hey @dlh01, thanks a lot for your thoughtful feedback! I think you're making a number of great points there, so it's much appreciated 😄

As a regular reader of the block API, I found the $callback parameter somewhat unclear. In the context #5158, of course, it makes more sense. Otherwise, $callback seems just as likely to apply at the end of the function as to the beginning. Perhaps it acts as replacement logic for the whole function. A longer name like $tree_callback or something might help disambiguate.

Yeah, I found the naming part hard. Eventually, I went with the fairly generic $callback, as is also used by somewhat similar PHP functions such as array_map.

I'm a bit reluctant to use $tree_callback as it seems to put too much focus on the tree aspect 🤔 $block_callback also comes to mind, but to me, that evokes more something like a render_block callback... Naming is hard! 😅 Maybe I can solve this by a better argument description in the PHPDoc?

Next, after thinking about the variable name, I wondered why this function would introduce a dedicated parameter for what most other functions would make accessible as a filter — pre_serialize_block or something. My instinct as a developer at an agency would be that if core has a need to modify the behavior of serialize_block(), doing so with a filter would be more in keeping with the extensibility of the project and preserve the ability for developers to integrate with core.

Still, I can think of some reasons against a filter, such as being wary about adding a filter to a hot code path, a desire to keep the new behavior in #5158 "private," or disliking the pattern of adding and removing one-off filters.

Yes, that was pretty much my thought process! I'll add that the main reason against a filter was that serialize_block is a fairly low-level function that's used in a variety of different contexts, so it seemed risky to require the user to remove the filter after serialization to make sure it wouldn't be accidentally applied when running serialization in a different context. Finally, if we have different parts of the codebase add different filters for serialize_blocks (to be used in different contexts), it could also get a bit hard to tell which ones apply to a certain code path, since the addition and removal of filters isn't necessarily colocated with the serialize_block() call.

So that leads to my final thought, which is, rather than add a callback to serialize_block(), create a separate function for traversing the block tree, and call that before serialize_blocks(), e.g. [...]

I also considered this option 😅 but ultimately decided against. Two reasons here:

  • Efficiency: It would duplicate tree traversal, which is a potentially costly operation. As stated in the PR description, serialize_block has to recursively traverse the tree; if we need to traverse it for another purpose, we might as well do it here 😄
  • Established pattern: There's arguable some precedent for a pattern like this -- application of a function to every element of a data structure while it is being traversed (the basic one being once again array_map). In cases like these, languages like PHP (but also JS) tend to accept the callback as a function argument.

FWIW, I was considering still some other alternatives/variations (e.g. even re-implementing serialize_blocks itself using a variation of your map_block_tree function, and/or accepting an array of callbacks rather than just an individual one) but eventually landed on the present approach since it seemed to strike a good balance between the requirements described above, and pretty good backwards-compatibility (via adding a new argument to an existing function). It also allowed keeping serialize_blocks perfectly generic and ignorant of applications such as hooked block insertion, as that can be completely contained in the callback.

With all that said, I could see us adding a generic filter to serialize_block like you suggested later if a strong need arises, but for the time being, I'd rather go with the callback argument, for the above reasons 😊

Edit: I only saw that @gziolo already replied to a similar effect after I submitted my comment 😅

Copy link
Member

@gziolo gziolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. We still might want to improve the part related to how the new $callback param is called, or exercise a different pattern for fulfilling the same task as discussed in the comments. Let's ensure it doesn't delay the process of integration the Block Hooks into WP 6.4 beta 1.

@ockham
Copy link
Contributor Author

ockham commented Sep 12, 2023

Thank you @gziolo!

I'll make one minor change to the PHPDoc for $callback to clarify what it's for.

@ockham
Copy link
Contributor Author

ockham commented Sep 12, 2023

Committed to Core in https://core.trac.wordpress.org/changeset/56557/.

@ockham ockham closed this Sep 12, 2023
@ockham ockham deleted the add/serialize-blocks-callback-arg branch September 12, 2023 13:01
@ockham
Copy link
Contributor Author

ockham commented Sep 12, 2023

BTW here's a basic first change that's going to leverage the new callback arg: #5192 😊

(Test coverage to guard against regressions is being added in #5186.)

@dlh01
Copy link

dlh01 commented Sep 12, 2023

Thanks @ockham and @gziolo for your comments! (And thanks @ockham for the generous props.)

Both of you are right, of course, that a separate function would require traversing the block tree again, which is not optimal. I would say in response, first, that a separate function leaves the door open to refactoring the logic in both functions to reduce the duplication with minimal effects on backwards compatibility.

Second, the performance effect of traversing the tree twice would be felt in only the locations where the new function was called, whereas the current implementation commits core to checking is_callable() before serializing every block everywhere. On a client project that we recently finished, for example, serialize_blocks() is called three times just within the scope of the_content as part of fulfilling various requirements for the site. I don't have the numbers to show it one way or the other, but I wonder whether these is_callable()s running all the time would soon add up to more than the cost of the traversals running some of the time, an effect that would reach our projects, at least.

@ockham
Copy link
Contributor Author

ockham commented Sep 13, 2023

Thanks @ockham and @gziolo for your comments! (And thanks @ockham for the generous props.)

Both of you are right, of course, that a separate function would require traversing the block tree again, which is not optimal. I would say in response, first, that a separate function leaves the door open to refactoring the logic in both functions to reduce the duplication with minimal effects on backwards compatibility.

But if it needs serialization in addition to another operation, it will always traverse the tree at least twice, won't it? I'd really rather avoid that as a lower boundary for file-based template processing, when it's clear that it is avoidable.
(Furthermore, I think that there's a risk that with this pattern, people will end up introducing another full tree traversal for each additional operation they would like to perform on the tree, rather than "merging" that operation into an existing traversal.)

Second, the performance effect of traversing the tree twice would be felt in only the locations where the new function was called, whereas the current implementation commits core to checking is_callable() before serializing every block everywhere. On a client project that we recently finished, for example, serialize_blocks() is called three times just within the scope of the_content as part of fulfilling various requirements for the site. I don't have the numbers to show it one way or the other, but I wonder whether these is_callable()s running all the time would soon add up to more than the cost of the traversals running some of the time, an effect that would reach our projects, at least.

That's fair enough (although my hunch is that is_callable will be a fairly quick check). I would still like to continue to stick with the API of a callback argument for serialize_blocks, and to avoid a lower boundary of two tree traversals for file-based templates.

I believe that it should be easy enough to optimize the chosen approach: If the is_callable call proves too costly, we could do something like the following:

function serialize_blocks( $blocks, $callback = null ) {
	$result = '';
	if ( is_callable( $callback ) ) {
		foreach ( $blocks as $block ) {
			$result .= serialize_block( $block, $callback );
		};
	} else {
		$result .= _serialize_block_without_callback( $block );
	}
	return $result;
}

With _serialize_block_without_callback defined like the "old" (pre-callback) serialize_block. A similar check could be added to serialize_block (singular). AFAICS, that would limit the overhead to just one is_callable check.

@gziolo
Copy link
Member

gziolo commented Sep 14, 2023

I checked out of the curiosity whether this change impacted performance metrics reported in trunk with https://github.com/WordPress/wordpress-develop/actions/runs/6159812895/job/16715609415:

Screenshot 2023-09-14 at 10 35 20

It gets compensated with the further refactoring as documented in #5192 (comment).

@dlh01
Copy link

dlh01 commented Sep 17, 2023

But if it needs serialization in addition to another operation, it will always traverse the tree at least twice, won't it? I'd really rather avoid that as a lower boundary for file-based template processing, when it's clear that it is avoidable. (Furthermore, I think that there's a risk that with this pattern, people will end up introducing another full tree traversal for each additional operation they would like to perform on the tree, rather than "merging" that operation into an existing traversal.)

I think you sort of answered your own question 😄 Traversing the tree twice is definitely avoidable. In my original comment, I suggested a separate function to apply the callback throughout the tree before serialization occurs. Another way of achieving the goal of minimizing the impact to serialize_block(s) would be to reverse your suggestion of a new function that serializes without a callback and create a _serialize_block(s)_with_callback().

Both new functions would, though, put us back to where we started, which is that the logic to recurse through the tree would be recreated.

I think at the heart of my original comment was the sense that it would be preferable to accept that duplicated logic in a new function that served the specific purposes presented by block hooks, compared to permanently encoding it in the main serialize_block(s) functions.

A new internal function can be deprecated and forgotten as the block API takes on new use cases (for example, if it becomes necessary to merge multiple operations into a traversal), but the new arguments to the existing functions will live on.

Anyway: This is fun to talk about, but I know you have other things to work on, so we can leave it here. I appreciate you indulging me in this discussion and will be happy to talk about it more if it ever comes up again!

@ockham
Copy link
Contributor Author

ockham commented Sep 19, 2023

@dlh01 Just a quick update: Since we need to add a bit more logic, we'll be reverting the change to serialize_block(s) and will introduce a new traverse_and_serialize_block(s) instead. Thanks again for the discussion, it was insightful feedback that eventually influenced this decision! 😄

@gziolo
Copy link
Member

gziolo commented Sep 19, 2023

I opened #5246, which addresses the changes mentioned by @ockham.

@dlh01
Copy link

dlh01 commented Sep 20, 2023

Very interesting, @ockham — thanks for the heads up!

On the subject of sharing code between the functions, the decorator pattern might become helpful as the block API evolves. Here's an example implementation that I was experimenting with after our initial discussion, based on the original changes to serialize_block(s). WordPress doesn't really go for interfaces, but in this example I think they would allow for sharing the innerContent loop without conditionals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants