Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The behavior for blank node parsing changed in later versions of PHP. #403

Closed
charlie-curtis opened this issue Apr 11, 2024 · 2 comments
Closed

Comments

@charlie-curtis
Copy link
Contributor

charlie-curtis commented Apr 11, 2024

Description

When upgrading from PHP8.1.22 to PHP8.3.4, there is inconsistent output between versions -- specifically for how blank nodes are handled.

Example

Input

<table>
	<caption>
		Cool table
	</caption>
	<tfoot>
	<tr>
		<th>I can do so much!</th>
	</tr>
	</tfoot>
	<tr>
		<td style="font-size:16pt;
      color:#F00;font-family:sans-serif;
      text-align:center;">Wow</td>
	</tr>
</table>

PHP8.1.22 output

<table><caption>
		Cool table
	</caption>
	<tfoot><tr><th>I can do so much!</th>
	</tr></tfoot><tr><td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
	</tr></table>

PHP8.3.4 output

<table>
	<caption>
		Cool table
	</caption>
	<tfoot>
	<tr>
		<th>I can do so much!</th>
	</tr>
	</tfoot>
	<tr>
		<td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
	</tr>
</table>

Impact

A strong case can be made that the PHP8.3.4 output is "more correct", and I wouldn't argue. The issue is that there is a ton of existing code and applications that maybe relying on the old behavior in order to "work". Having an optional backwards-compatible solution would ease the transition as many upgrade beyond PHP8.1.

Investigation

These steps have been performed:

  • verified that both PHP versions used for testing have the same version of libxml (2.9.1)
  • localized the behavior change to the loadHtml call here
  • verified that passing the LIBXML_NOBLANKS option fixed the output discrepancy

I think this php-src commit changed the default behavior of "blank" parsing from "don't keep" to "keep".

Suggested Fix

Much like LIBXML_PARSEHUGE is an optional configuration value that can be supplied here, I propose adding LIBXML_NOBLANKS as an optional value in order to better handle backwards compatibility as mentioned above without impacting existing use cases.

Similar issues

#237
#269

@ezyang
Copy link
Owner

ezyang commented Apr 11, 2024

Sgtm send the pr

@charlie-curtis
Copy link
Contributor Author

@ezyang thanks, the PR is here: #404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants