Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in importing WXR files #1211

Closed
aplamada opened this issue Apr 8, 2024 · 2 comments · Fixed by #1213
Closed

Issue in importing WXR files #1211

aplamada opened this issue Apr 8, 2024 · 2 comments · Fixed by #1213
Labels
[Feature] Import Export [Type] Bug An existing feature does not function as intended

Comments

@aplamada
Copy link

aplamada commented Apr 8, 2024

The content is generated using https://playground.wordpress.net/?theme=twentytwentythree&wp=6.5&php=8.2&plugin=wordpress-importer&plugin=inseri-core where a new post is created. The post is exported in the issue.xml.

Expected behaviour:

  1. Download issue.xml
  2. Open https://playground.wordpress.net/?theme=twentytwentythree&wp=6.5&php=8.2&plugin=wordpress-importer&plugin=inseri-core
  3. Admin UI: Tools -> Import -> Run Importer -> WordPress Run Importer -> Select theissue.xml file downloaded previously
  4. Check the Issue post and notice the "#test" content

Issue in Query API import-wxr

  1. Try the xml: https://playground.wordpress.net/?theme=twentytwentythree&wp=6.5&php=8.2&plugin=wordpress-importer&plugin=inseri-core&import-wxr=https://raw.githubusercontent.com/inseri-swiss/inseri-playground/main/issue/issue.xml
  2. Select the Issue post. You can notice that the block "is loading..." . In edit mode you can try to Attempt Block Recovery. The content looks like "u0022#testu0022".

The issue is present also in the blueprint.

It used to work fine until the end of last week. fad3ccf might be related to it.

@adamziel adamziel added [Type] Bug An existing feature does not function as intended [Feature] Import Export labels Apr 8, 2024
@adamziel adamziel added this to the Zero Crashes milestone Apr 8, 2024
@adamziel
Copy link
Collaborator

adamziel commented Apr 8, 2024

Great report @aplamada, thank you! I just proposed a fix in #1213

@aplamada
Copy link
Author

aplamada commented Apr 8, 2024

Thanks @adamziel !

adamziel added a commit that referenced this issue Apr 8, 2024
Preserves the backslashes in the content imported through `importWxr` by
calling `wp_slash` on the entire imported data.

This issue started after the recent switch to the
`humanmade/wordpress-importer` in
#1192. Turns out
that importer doesn't call `wp_slash` on its own.

Closes #1211
adamziel added a commit that referenced this issue Dec 11, 2024
…2058)

## Description

Adds the Data Liberation WXR importer as an option in the `importWxr`
step. The new importer is turned by including the `"importer":
"data-liberation"` option:

```json
{
  "steps": [
    {
      "step": "importWxr",
      "file": {
        "resource": "url",
        "url": "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml"
      },
      "importer": "data-liberation"
    }
  ]
}
```

When the `importer` option is missing or set to "default," nothing
changes in the behavior of the step and it continues using the
https://github.com/humanmade/WordPress-Importer importer.

The new importer:

* Rewrites links in the imported content
* Downloads assets through Playground's CORS proxy
* Parallelizes the downloads
* Communicates progress

This PR is a part of
#1894

## Implementation details

This `importWxr` step fetches and includes the
`data-liberation-core.phar` file. The phar file is built with
[Box](https://box-project.github.io/box/configuration/) and contains the
importer library with its dependencies, which is a subset of the Data
Liberation library, a subset of the Blueprints library, and a few vendor
libraries.

This, unfortunately, means that any changes in the PHP files require
rebuilding the .phar file. Here's how you can do it:

```bash
nx build:phar playground-data-liberation
```

You can also build the entire Data Liberation package as a WordPress
plugin complete with a wp-admin page:

```bash
nx build:plugin playground-data-liberation
```

Both commands will output the built files to
`packages/playground/data-liberation/dist`

The progress updates are a first-class feature of the new importer. The
updated `importer` step receives them in real-time via a
`post_message_to_js()` call running after every import step. Then, it
passes them on to the progress bar UI.

### Other changes

* **TLS traffic now goes through the CORS proxy.** Since the new
importer uses `AsyncHTTP\Client` which deals with raw sockets,
Playground's [TLS-based network
bridge](#1926)
runs the outbound traffic through a cors proxy. Technically,
`TCPOverFetchWebsocket` gets the `corsProxy` URL passed to the
`playground.boot()` call.
* A few composer dependencies were forked, downgraded to PHP 7.2 using
Rector, and bundled with this PR to keep the Data Liberation importer
working.

## Remaining work

- [x] PHP 7.2 compatibility. Done by forking and Rector-downgrading
dependencies that were incompatible with PHP 7.2.
- [x] Report the importer's progress on the overall Blueprint progress
bar
- [x] Enqueue the data liberation plugin files for downloading at the
blueprint compilation stage
- [x] Don't eagerly rewrite attachments URLs in `WP_Stream_Importer`.
Exposing this information to the API consumer requires an explicit
decision. Do we rewrite it? Or do we ignore it?
- [x] Fix the TLS errors at the intersection of Playground network
transport and the async HTTP client library
- [x] Separate the markdown importer and its dependencies (md parser,
frontmatter parser, Symfony libraries) from the core plugin
- [x] Ship the importer and its tree-shaken deps (URL parser) as a
minified zip/phar

## Follow-up work

- [ ] Reconsider the `WP_Import_Session` API – do we need so many
verbosely named methods? Can we achieve the same outcomes with fewer
methods?
- [ ] Investigate why there's a significant delay before media downloads
start on PHP 7.2 – 7.4. It's likely a PHP.wasm issue.

## Testing instructions

* Default importer – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20})
and confirm it does what the current `importWxr` step do, that is it
stays at "Importing content" for a moment, fails to fetch media files
(CORS issues in network tools), but inserts posts and pages.
* Data Liberation – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22importer%22:%20%22data-liberation%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}),
confirm the import progress is visible and that the content and media
indeed get imported:

![CleanShot 2024-12-08 at 14 54
49@2x](https://github.com/user-attachments/assets/a7da3244-a10f-43d2-8e94-43d305220a7e)

## Related issues

* #1211 
* #2012 
* #1477 
* #1250 
* #1780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Import Export [Type] Bug An existing feature does not function as intended
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants