Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Stream rewrite URLs in a remote WXR file
Brings together a few explorations to stream-rewrite site URLs in a WXR file coming from a remote server. All of that with no curl, DOMDocument, or other PHP dependencies. It's just a few small libraries built with WordPress core in mind: * [AsyncHttp\Client](WordPress/blueprints#52) * [WP_XML_Processor](WordPress/wordpress-develop#6713) * [WP_Block_Markup_Url_Processor](https://github.com/adamziel/site-transfer-protocol) * [WP_HTML_Tag_Processor](https://developer.wordpress.org/reference/classes/wp_html_tag_processor/) Here's what the rewriter looks like: ```php $wxr_url = "https://raw.githubusercontent.com/WordPress/blueprints/normalize-wxr-assets/blueprints/stylish-press-clone/woo-products.wxr"; $xml_processor = new WP_XML_Processor('', [], WP_XML_Processor::IN_PROLOG_CONTEXT); foreach( stream_remote_file( $wxr_url ) as $chunk ) { $xml_processor->stream_append_xml($chunk); foreach ( xml_next_content_node_for_rewriting( $xml_processor ) as $text ) { $string_new_site_url = 'https://mynew.site/'; $parsed_new_site_url = WP_URL::parse( $string_new_site_url ); $current_site_url = 'https://raw.githubusercontent.com/wordpress/blueprints/normalize-wxr-assets/blueprints/stylish-press-clone/wxr-assets/'; $parsed_current_site_url = WP_URL::parse( $current_site_url ); $base_url = 'https://playground.internal'; $url_processor = new WP_Block_Markup_Url_Processor( $text, $base_url ); foreach ( html_next_url( $url_processor, $current_site_url ) as $parsed_matched_url ) { $updated_raw_url = rewrite_url( $url_processor->get_raw_url(), $parsed_matched_url, $parsed_current_site_url, $parsed_new_site_url ); $url_processor->set_raw_url( $updated_raw_url ); } $updated_text = $url_processor->get_updated_html(); if ($updated_text !== $text) { $xml_processor->set_modifiable_text($updated_text); } } echo $xml_processor->get_processed_xml(); } echo $xml_processor->get_unprocessed_xml(); ```
- Loading branch information