Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent output across PHP versions #269

Closed
liamkeily opened this issue Sep 24, 2020 · 3 comments
Closed

Inconsistent output across PHP versions #269

liamkeily opened this issue Sep 24, 2020 · 3 comments

Comments

@liamkeily
Copy link

liamkeily commented Sep 24, 2020

I've noticed a strange inconsistency with html purifier. Any ideas what this could be related to?

Script:

<?php
require __DIR__ . '/../vendor/autoload.php';

$html = <<<HTML
<h1>Test</h1>
<h2>Test 2</h2>
<p>This is a paragraph
This is a new line
Another new line</p>
<ul>
<li>bullet</li>
<li>bullet 2</li>
</ul>
<p><img src="imagesrc.png" alt="img" /></p>
<p><a href="https://www.google.com">Hyperlink</a></p>
HTML;

$output = (new HTMLPurifier)->purify($html);
echo md5($output) . PHP_EOL . $output;

Ubuntu 18.04.4 LTS (Dev VM)
PHP 7.4.8 (cli) (built: Jul 13 2020 16:45:47) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.8, Copyright (c), by Zend Technologies

<h1>Test</h1>
<h2>Test 2</h2>
<p>This is a paragraph
This is a new line
Another new line</p>
<ul><li>bullet</li>
<li>bullet 2</li>
</ul><p><img src="imagesrc.png" alt="img" /></p>

(md5 3966db7c2db30e0e63f566ac4a01632d)

--

Ubuntu 18.04.4 LTS (Dev VM)
PHP 7.4.10 (cli) (built: Sep 9 2020 06:36:14) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.10, Copyright (c), by Zend Technologies

<h1>Test</h1>
<h2>Test 2</h2>
<p>This is a paragraph
This is a new line
Another new line</p>
<ul>
<li>bullet</li>
<li>bullet 2</li>
</ul>
<p><img src="imagesrc.png" alt="img" /></p>
<p><a href="https://www.google.com">Hyperlink</a></p>

(md5 f4b6f3065f0adb5ae6ab3e45f2380586)

--

Ubuntu 18.04.5 LTS (CI Server)
PHP 7.4.10 (cli) (built: Sep 22 2020 10:00:08) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies

<h1>Test</h1>
<h2>Test 2</h2>
<p>This is a paragraph
This is a new line
Another new line</p>
<ul><li>bullet</li>
<li>bullet 2</li>
</ul><p><img src="imagesrc.png" alt="img" /></p>
<p><a href="https://www.google.com">Hyperlink</a></p>

(md5 3966db7c2db30e0e63f566ac4a01632d)

@ezyang
Copy link
Owner

ezyang commented Sep 24, 2020

Usually it's due to differences in the version of libxml shipped with PHP, which we use to do parsing.

@liamkeily
Copy link
Author

liamkeily commented Sep 24, 2020

The 2 differing PHP versions give the same output for php -i | grep 'libxml'. Could they still be different?

PHP 7.4.10 (cli) (built: Sep 9 2020 06:36:14) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.10, Copyright (c), by Zend Technologies

libxml Version => 2.9.10
libxml
libxml2 Version => 2.9.10
libxslt compiled against libxml Version => 2.9.4

PHP 7.4.8 (cli) (built: Jul 13 2020 16:45:47) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.8, Copyright (c), by Zend Technologies


libxml Version => 2.9.10
libxml
libxml2 Version => 2.9.10
libxslt compiled against libxml Version => 2.9.4

@ezyang
Copy link
Owner

ezyang commented Sep 24, 2020

Oh, that is fairly strange. If you want to try debugging this, try printing the intermediate html after each html purifier phase and try to localize where the difference shows up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants